Non-Sponsored Technology Adoption: How Network Effects ... premi/tfc 41 23 Nueno.pdf2. Overview of BitTorrent and The Pirate Bay BitTorrent is a protocol that facilitates peer-to-peer

UNIVERSITAT AUTÒNOMA DE BARCELONA

Non-Sponsored Technology

Adoption: How Network Effects

Create Dominant Standards

PUE Final Project

Josep Nueno Guitart

Tutor: Gabriel Izard Granados

Abstract

This project studies the role of network effects in the diffusion of technical innovations. In order

to do so we study the transition from an incumbent video containing format (AVI) to a newer

more efficient one (MP4) in the context of The Pirate Bay, a peer-to-peer file sharing network,

using a database of the 2.1 million files available in their catalog. In order to carry on our

analysis we divide the peers active in The Pirate Bay in those that upload files and those that

download them, and we find statistically significant evidence of network effects in the MP4

adoption shares for both groups. Lastly, we propose a theoretical model to explain some of the

observed phenomena.

Este proyecto estudia el rol que los efectos de red desempeñan en la difusión de innovaciones.

Analizamos la transición de un formato contenedor de video incumbente (AVI) a otro más

nuevo y eficiente (MP4) en el contexto The Pirate Bay, una la red peer-to-peer para compartir

archivos, usando una base de datos de los 2.1 millones de archivos disponibles en su catálogo.

Para llevar a cabo nuestro análisis dividimos a los usuarios de The Pirate Bay en aquellos que

cargan ficheros y aquellos que los descargan, y encontramos evidencia estadísticamente

significativa de efectos de red en la proporción de usuarios que adoptan MP4 en ambos grupos.

Finalmente proponemos un modelo teórico para explicar algunos de los fenómenos observados.

PROGRAMA UNIVERSITAT-EMPRESA

1

Non-sponsored technology adoption: how

network effects create dominant standards.

Tutor: Gabriel Izard Granados

Signatura:

2

PRESENTACIÓ DEL TREBALL FI DE CARRERA

Josep Nueno Guitart, alumne de la vint-i-tresena Promoció del Programa

Universitat-Empresa, fa entrega per duplicat del treball fi de carrera titulat:

Non-sponsored technology adoption: how network effects create dominant

standards amb el qual participa en la dinovena convocatòria de les Beques

Universitat Empresa. Declara conèixer i acceptar les bases de la convocatòria.

Així mateix declara que el treball fi de carrera que presenta és inèdit, no

plagiat, i haver respectat el compromís de confidencialitat amb les empreses

del PUE.

Igualment autoritza al Programa Universitat-Empresa a la publicació del seu

treball.

Bellaterra (Cerdanyola del Vallès), 30 de maig de 2013

Signat:

3

Table of contents

1. Introduction 6

2. Overview of The Pirate Bay and BitTorrent 10

3. Overview of digital audiovisual formats 12

4. Literature review 16

5. Econometric model 20

6. Data 22

7. Results 30

8. Theoretical model 34

9. Simulation 43

10. Discussion 47

11. Appendix 1 – Code used for database cleanup 51

12. Appendix 2 – Descriptive statistics for television shows 55

13. Appendix 3 – Stata output 56

14. Appendix 4 – MKV regressions and comments 62

15. Appendix 5 – Theoretical model derivations 65

16. Appendix 6 – Code used for the simulation 68

17. References 70

4

Index of figures

Figure 1. Revolutionary innovations in media technology 5

Figure 2. Revolutionary and Evolutionary innovations within media paradigms 5

Figure 3. Monthly uploaded AVI and MP4 files to TPB 9

Figure 4. Monthly format share of uploaded AVI and MP4 files to TPB 9

Figure 5. Broadband penetration and TPB use 11

Figure 6. Monthly uploads to TPB by format 13

Figure 7. HD television sets sold in 2005-2010 14

Figure 8. Monthly uploaded media files to TPB per codec 15

Figure 9. Dataset creation process 23

Figure 10. Average number of seeders per file for AVI and MP4 files 24

Figure 11. Average number of seeders per file (TV Shows sample) 26

Figure 12. Average number of comments per file (TV Shows sample) 27

Figure 13. Net share effects for = 0.9 39

Figure 14. Equilibrium adoption shares for firms varying and 45

Figure 15. Equilibrium adoption shares for firms varying n and fkmax 45

Figure 16. Equilibrium adoption shares for consumers varying and 46

Figure 17. Equilibrium adoption shares for consumers varying n and fkmax 47

5

Index of tables

Table 1. Descriptive statistics 29

Table 2. Random effects MLE regression of MP4 share for users 31

Table 3. Random effects MLE regression of AVI share for users 32

Table 4. Random effects MLE regression of MP4 share for uploaders 33

Table 5. Possible interactions between firms and consumers 35

Table 6. List of parameters used in the simulation 43

Table 7. List of variables used in the simulation 45

6

1. Introduction

After increasing significantly during the late 20th

century, the rate at which

technological innovations are generated experienced unusually high levels of

acceleration over the last decade: digitalization creates very demanding environments as

the relative ease of transition between technologies results in a faster innovation-

substitution cycle. Even if these changes rarely suppose a drastic change in terms of the

improvements new developments warrant, the high frequency at which they are taking

place make them an interesting subject.

Varian and Shapiro (1999) categorize technical innovations depending on whether

the new standard is compatible with the old one or not. If it is he calls it an

“evolutionary innovation” while if it is not it becomes a “revolutionary innovation”.

Looking at the literature on standards from this perspective one can notice that most

work addresses the dynamics of revolutionary innovations (i.e. disruptive changes

among standards) while there is much less focus on evolutionary ones (the substitution

activity within a technology as the many innovative alternatives fight to become the

dominant standard). Another classification prevalent in the reviewed literature is the

distinction between sponsored and unsponsored standards. The former are proprietary

technologies sold by an agent who is capable of strategic maneuvering in order to

maximize the chances of its standard becoming the dominant one. On the other hand in

the case of unsponsored standards, no one else besides the final consumers stand to gain

anything from adoption. While some papers on theory of unsponsored standards have

been written, most notably Katz and Shapiro (1985, 1986) and Farrell and Saloner

(1986), little empirical research has been conducted on them, and the emphasis has

always been on competition rather than replacement of an incumbent format by a more

effective new one.

One aspect in which most research agrees is in the relevance direct and indirect

network effects have in the adoption of new technologies. Direct network effects are a

consequence of adoption by other users: the classical example is the increase in utility

users connected to a telephone network experience when an additional user decides to

join in. Indirect network effects are different in that while their impact may also

increase along user adoption, they are not a direct consequence of it. An example of

them is software variety in different operating systems for personal computers: a higher

7

adoption rate for Mac computers increases the incentive of developers to create products

that work in that platform, which in turn increases the variety of applications available

for that operating system. This is a self-reinforcing loop, since more variety entails a

higher attractiveness for the platform which increases adoption by users. While harder

to identify than direct effects, indirect network effects play a huge role in determining

whether or not a specific technology will succeed in carving out a user base1.

Figure 1: Revolutionary innovations in media technology

Figure 2: Revolutionary (red) and Evolutionary (blue) innovations within media

paradigms

This project intends to study the dynamics of adoption in the case of evolutionary

innovations, paying special attention to the impact of network effects. A case at hand is

1 For a detailed empirical investigation of the relevance of indirect network effects see Ohashi (2004)

8

the gradual replacement there has been in digital video formats, where Audio Video

Interleave (AVI), a format originally designed by Microsoft but freely licensed, was

gradually replaced by ISO’s MPEG-4 Part 14 (MP4). This change was by no means

revolutionary since it did not challenge the governing technical paradigm (digital video,

see Figures 1 and 2), but it did carry incremental improvements to the quality and

usefulness of the contents offered. Figures 3 and 4 show the count and share of AVI and

MP4 multimedia files uploaded daily to The Pirate Bay which, for the time being, can

be taken as the dominant catalog of files being shared in BitTorrent, a protocol for peer-

to-peer sharing of files. We can notice how during the second half of the ‘00s MP4

gained ground over AVI as the preferred format in which to distribute media files, until

becoming the dominating one by mid-2012. This setting is ideal to study how non-

disruptive, non-sponsored technologies bid for dominance within a technical paradigm,

in this case digital video. Furthermore, the exponential growth of MP4 is suggestive of

self-reinforcing dynamics, which in turn points to network effects as a driver of

adoption. Files shared in BitTorrent are ideal to study the role of indirect network

effects since peers on the network can be divided into file uploaders (a small subset of

total peers) and regular users who only download files, and by studying how adoption

decisions of a group impact the adoption decision of the other we can separate and

assess the impact of direct and indirect network effects.. Additionally, since the studied

technologies are freely used in this context we can zero-in network effects while paying

only limited attention to strategic maneuvering by the original designers or other parties.

Sections 2 and 3 give a quick overview of the workings of BitTorrent, The Pirate

Bay and video container formats (AVI and MP4). Section 4 gives an overview of the

literature written on the subject. In section 5 presents an econometric model to explain

the changes in adoption of video formats by peers, and a description the data

manipulations that were carried out in order to test the model is the subject of section 6.

Section 7 proposes a theoretical model to explain the observed results, and in section 9 a

simulation is run in order to better understand its dynamics. Finally section 10 discusses

the findings of the project.

9

0

500

1000

1500

2000

2500

3000

3500

4000

4500

20

04

-08

20

05

-01

20

05

-06

20

05

-11

20

06

-04

20

06

-10

20

07

-03

20

07

-08

20

08

-01

20

08

-06

20

08

-11

20

09

-04

20

09

-09

20

10

-02

20

10

-07

20

10

-12

20

11

-05

20

11

-10

20

12

-03

20

12

-08

20

13

-01

Mo

nth

ly u

plo

ads

to T

PB

Figure 3: Monthly uploads to TPB of AVI and MP4 files

MP4

AVI

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

20

04

-08

20

05

-01

20

05

-06

20

05

-11

20

06

-04

20

06

-10

20

07

-03

20

07

-08

20

08

-01

20

08

-06

20

08

-11

20

09

-04

20

09

-09

20

10

-02

20

10

-07

20

10

-12

20

11

-05

20

11

-10

20

12

-03

20

12

-08

20

13

-01

Mo

nth

ly u

plo

ad s

har

e

Figure 4: Monthly share of MP4 and AVI for media files uploaded to TPB

MP4

AVI

10

2. Overview of BitTorrent and The Pirate Bay

BitTorrent is a protocol that facilitates peer-to-peer sharing of large files. Peer-to-

peer downloading differs from traditional client-server downloading in that the transfer

of a file is not handled by a single central server, but instead is carried out by a network

of computers running a peer-to-peer file-sharing software (or client). When a specific

file is downloaded from a peer-to-peer network each computer in it that has the

requested file transfers a small part of it, which greatly improves efficiency, both in

terms of congestion and in terms of download time. BitTorrent was devised with the

goal of incentivizing sharing and minimize the free rider behavior that is so prevalent in

peer-to-peer sharing (mainly, disconnecting from the network as soon as the download

is complete). It does so through a “tit-for-tat” system which ranks each peer in terms of

the amount of time it remains connected to the network after finishing his or her

downloads, improving future download speeds together with the ranking. Computers

that are sharing the complete file are known as “seeders”, and are an indicator of the

availability of a file in the network. Since for most peers upload speed is significantly

lower than download speed BitTorrent allows parallel download of file chunks,

therefore bypassing the bandwidth bottleneck other peer-to-peer networks are faced

with.

All these factors have contributed to a steady increase over the last decade in the

number of users and companies using the BitTorrent protocol to share files legally or

illegally. As Figure 5 illustrates, the advent of broadband for mass consumption has

facilitated the exchange increasingly large files, and since the early 2000s sharing video

files (be it movies, video clips or television shows) has become commonplace for a lot

of internet users. Following this surge in popularity websites started appearing that

indexed the files in the BitTorrent network and provided the necessary information to

access them (Torrent Files and Magnet Links2).

2 Torrent Files contain data about the locations of a file within the BitTorrent network while Magnet

Links contain a unique identifier that is derived from the contents of the file. Both can be used to start peer-to-peer downloads in the BitTorrent protocol.

11

Source: OECD, own preparation

One of the biggest repositories is The Pirate Bay (TPB), established in 2004 and

indexing as of February 2013 more than 2 million files. During its history TPB has been

frequently on the spotlight due to its status as one of the biggest networks providing

access to pirated content, which have made it the target of a lot of attention by

intellectual property enforcers who repeatedly have taken legal action against it. Despite

their efforts the website continues active although it has been forced to change its

country domain several times (currently all its traffic is being routed from the island of

Saint Martin). Its long trajectory along with its popularity make it an ideal candidate to

study how the MP4 format has come to dominate most video downloads. The website

has a page for each file in its catalog which contains information on it and allows users

to leave comments. Further increasing its attractiveness a dump of its database was

carried out in February 2013 by programmer Karel Bilek who publically posted the

results making them available for download by anyone interested.

The motivations and incentives of peers operating in a BitTorrent network are

clearer for those that download files than for hose that upload them. The incentives for

the downloading side don’t really need to be explained, since there is an obvious benefit

to downloading media for free and at a decent speed. However the uploading side, the

one that creates the file and prepares it for sharing, invests a significant effort in it

apparently without gaining any kind of compensation, other than the occasional thank

0

50000

100000

150000

200000

250000

300000

0.00

5.00

10.00

15.00

20.00

25.00

30.00

File

up

load

s to

TP

B

Bro

adb

and

pen

etra

tio

n o

n O

ECD

co

un

trie

s (%

) Figure 5: Broadband penetration and TPB use

Broadband penetration on OECD countires Bi-Quarterly uploads to TPB

12

you from people that download. In some cases Torrent files contain text documents

which lead to the uploading part’s website, thus generating traffic which may translate

into ad revenue. Other files may contain malware or may require the completion of

surveys or the disclosure of personal information in order to unlock the contents. Finally

some peers seem to operate out of idealism, with their final goal set on the free

circulation of information, whatever its shape or content. Regardless of the motivation

behind the file upload one thing is clear: the more a file is shared the better. All the

peers that upload files wish to maximize its impact by making sure it will have a wide

distribution. Format is one of the choices which factors into the success of a file and

therefore peers that upload them decide which one to adopt keeping this maximization

goal in mind. For clarity during the rest of the project those users that upload files will

be referred to as “uploaders” while those that download them will just be “users”.

The fact that downloading a file from BitTorrent increases the download speeds for

that specific file for all users makes the system ideal to study direct network effects.

Indirect network effects can also be assessed, in particular the effect that the diversity of

contents available in either format has on the adoption rates of each type of peers.

3. Overview of digital audiovisual formats

In order to be able to play a digital video it first needs to be contained in a wrapper or

container format. This wrapper contains data about the video file which the media

player, the application in charge of turning the digital information contained in the file

into actual images and sounds, needs in order to run the video. There are several

competing formats available, each with its strengths and weaknesses. Figure 6 shows

the number of monthly uploads of four of the most frequently used wrappers which for

convenience will be called “the big four”: MKV, WMV, MP4 and AVI. Despite this

variety it quickly becomes clear that two formats have been in dominating positions

over the decade in terms of the files available for download on TPB: Microsoft’s Audio

Video Interleave (AVI) and the International Standards Organization’s MP4. The fate of

WMV, another Microsoft wrapper, is very closely tied to the one of its more popular

and successful cousin AVI. Furthermore, due to methodological constraints, WMV was

not suitable for our study since, as will be explained later on, the empirical part will

13

center on format changes for television shows, of which only a very small part is

wrapped in WMV (a disproportionate amount WMV files are “adult” files, for some

reason). On the other hand MKV also experimented a big increase in usage during the

same time period. While this project focuses mostly on the competition between AVI

and MP4, we will provide some estimates and comments on MKV adoption in

Appendix 3, and its usefulness for future research will be examined in the discussion

section.

For the better part of the last decade AVI was in a dominant position with most

audiovisual files exchanged on BitTorrent wrapped in that format. An AVI file is

divided into three parts, also known as “chunks”: the first one contains metadata, such

as the video’s definition (width and height) or frame rate. The second one contains the

audiovisual content proper which is encoded by using software library known as a

codec: before packaging a video file into its container it needs to be encoded, which is

the process of transforming analog data into digital data. Software libraries designed to

enable this process are known as codecs, of which there is a large variety and many are

freely available to the public. The final chunk is optional and contains additional

metadata on the file.

Despite its initial success, the AVI format has several limitations, especially

regarding compression and aspect ratio, as well as a lack of standardization for features

0

500

1000

1500

2000

2500

3000

3500

4000

4500

20

04

-08

20

05

-01

20

05

-06

20

05

-11

20

06

-04

20

06

-10

20

07

-03

20

07

-08

20

08

-01

20

08

-06

20

08

-11

20

09

-04

20

09

-09

20

10

-02

20

10

-07

20

10

-12

20

11

-05

20

11

-10

20

12

-03

20

12

-08

20

13

-01

Mo

nth

ly u

plo

ads

to T

PB

Figure 6: Monthly uploads to TPB for the big four

MP4

AVI

WMV

MKV

14

such as the time code, which are important for professional use of the file. Some

competing formats have solved this issues allowing for more efficient file transfer and

manipulation. One of these is ISO’s MP4 format (also known as MPEG-4 Part 14),

developed as a version of Apple’s QuickTime File Format. Although it was first

published in 2001 it didn’t start enjoying success until the mid ‘00s, mostly due to the

popularization of High Definition (HD) media. Before HD the most successful codec

was DivX, which allowed the efficient compression of large videos into a digital file.

Most AVI files had their second chunk encoded with DivX, so the popularity of the

codec fueled the diffusion of the format. However, one of the shortcomings the public

version of the codec had was its inefficiency when it came to encoding HD videos.

There are many codec alternatives (such as x264), although at first they didn’t enjoy

much success. This is attributable to the lack of demand for HD files, since most

monitors at the beginning of the 2000s were not able to reproduce this content. However

as the decade advanced the widespread adoption of HD television sets and screens (see

fig. 7) changed this situation, greatly increasing the demand for HD media files.

Figure 7: HD television sets sold in 2005 – 2010

Source: GfK Retail and Technology, July 2010

The fact that many of this television sets allowed the reproduction of encoded

digital files increased the demand for digital media. However this increase in demand

didn’t spread evenly among all codecs, and, as can be appreciated in Figure 8, those that

were better suited for encoding HD files (in particular x264) absorbed most of the

bump.

15

While MP4 was not the only container format able of carrying x264 encoded

video files it benefited greatly from the switch to the new codec since, due to the factors

examined later in this section, it was in an advantageous position to exploit any

weakness AVI showed. The similarity between the x264 trend in Figure 8 and the MP4

trend in Figure 5 illustrates a very strong tie between the x264 codec and the format.

Another force that played an important role in the adoption of MP4 was advent

of smartphones and digital music. One of the first companies that started taking

advantage of the improvements MP4 offered was Apple who created an audio codec

(Apple Lossless) which stored audio data into an MP4 wrapper and that was used for all

the music iTunes Store offered. Other portable devices also offered compatibility with

the standard and the advent of mobile computing entailed a large increase in the

installed base of devices offering compatibility with the new format. In 2004 Apple

released the source code for its Apple Lossless codec, making it open source and royalty

free, further fuelling its growth. At that point most of the content that was being

distributed in MP4 was music and despite its advantages over AVI the format was

hardly used for packaging audiovisual contents: mobile devices still did not have neither

the memory nor the resources necessary to play those videos, and HD screens were not

popular yet. Furthermore on the TPB site the huge popularity of AVI had locked-in

users and uploaders into the incumbent format. The release of additional MP4-

0

2000

4000

6000

8000

10000

12000

20

04

-04

20

04

-09

20

05

-02

20

05

-07

20

05

-12

20

06

-05

20

06

-10

20

07

-03

20

07

-08

20

08

-01

20

08

-06

20

08

-11

20

09

-04

20

09

-09

20

10

-02

20

10

-07

20

10

-12

20

11

-05

20

11

-10

20

12

-03

20

12

-08

20

13

-01

Mo

nth

ly u

plo

aded

file

s

Month

Figure 8: Number of monthly uploaded files to TPB per codec

Divx

x264

16

compatible devices such as PlayStation 3 and the incipient penetration of media centers

into households further increased the attractiveness of the format.

This account summarizes external trends that explain the adoption of MP4 by

users. Without them network effects alone would not have been sufficient to move the

user base of TPB and the wider BitTorrent community out of the old format. In later

parts of the project we will take this into account, despite the fact that our focus is on

the network effects.

4. Literature review

The first literature on technology adoption was written in the 1980s and focuses greatly

in the role network effects play in the process. In their seminal paper, Katz and Shapiro

(1985) described how consumption externalities that were generated by users of a

product impact its demand. They identify two possible types of consumption

externalities: the first, corresponding to direct network effects, is a “direct physical

effect of the number of purchasers on the quality of the product”, as in the telephone

case described in the introduction. Indirect network effects are variables that while they

can be related to the number of users of a certain product are not exclusively dependent

on it: market quota or the costs of post-sale services could be instances of these. They

go on to create a model in which firms compete to attract consumers to their networks

through pricing and find that under certain circumstances the optimal solution is to

allow compatibility between their networks in order to maximize adherence by

consumers by amplifying the intensity of network effects.

Further developing their approach Katz and Shapiro (1985, 1986 and 1995)

extend their model and apply it to technology adoption. They introduce the distinction

between sponsored and unsponsored standards, and identify the inefficiencies that can

arise from adoption: specifically they demonstrate how the incumbent technology has

an advantage over the new one in the case of unsponsored standards, while a sponsored

one will have a strategic advantage over unsponsored ones, even if it is inferior, since

they can behave in a strategic way in order to make sure their technology is the one that

finally succeeds. This process of replacement of a superior technology by an inferior or

less mature one is called excess momentum. They go on to describe the possible

manoeuvers a sponsored standard can use in order to gain the upper hand such as

17

committing to future prices or, in the case of the software-hardware paradigm,

integrating vertically.

Another tenet of the late 80s technology adoption literature are the works of

Farrell and Saloner (1985, 1986). Their model describes adoption of an unsponsored

standard and includes two additional factors into the consumers’ choice: first they

assume consumers can form expectations as to which standard will succeed, which can

lead to a bandwagon effect in which the choice of the first consumer creates a cascading

effect that makes all subsequent adoption decisions identical to the first one. Second,

they introduce the notion of an installed base, which is a reflection of the number of

users committed to a standard on day zero. This installed base under uncertainty

conditions can trap the market into an old, inferior standard since, despite being

individually interested in adopting, consumers do not dare to do so because they ignore

what the choices of subsequent adopters may be, a process which they will call excess

inertia.

A final set of models proposed in in the 80s focused on increasing returns and

path dependence, with Arthur (1987, 1990 and 1994) as one of the main exponents of

the current. In the series of papers compiled in the book Increasing returns and path

dependence in the economy he provides models that illustrate how historical accidents

may explain why an inferior technological standard may end up being adopted over a

superior one. The main analytical tool he uses is a statistical model known as a Polya

urn process: “it can be pictured by imagining a table to which balls are added one at a

time; they can be of several possible colors – white, red, green or blue. The color of the

ball to be added next is unknown, but the probability of a given color depends on the

current proportion of colors on the table”3. Arthur goes on to describe several

economical processes governed by similar self-reinforcing processes, such as the choice

of geographical location by firms, or technology adoption. In the case of technology

adoption he defines the adoption choice as a random walk with critical bounds: if the

process surpasses a threshold all future choices will go to the same technology. He goes

on to demonstrate the existence of several stable equilibriums for such problems, and

examines how historical accidents condition which one is eventually reached. His

models fit into the Evolutionary Economics school and are a great introduction to path

3 Arthur, Brian, 1994, Increasing Returns and Technology adoption p. 6

18

dependence, a notion that has gained a lot of relevance in development, spatial and

financial economics.

During the 1990s and 2000s most of the literature turns to the study of sponsored

standards and technologies and the strategic aspect of the problems. Some papers will

follow more on the tradition of the previous studies conducted while others choose to

develop their theories within the two sided markets approach. An example of the former

would be Besen and Farrell (1994), who will describe different competitive strategies in

standard setting and the mechanisms by which a firm may try to steer the market in its

favor, and how the market structure influences the outcome. They demonstrate how

when firms are similar they choose the same compatibility strategy and therefore

facilitate the emergence of a single, consolidated standard. However if firms are

dissimilar a standards battle is likely to occur: bigger firms may want to forbid new

entrants to adhere to its network. In the same line, Götz (1999) analyzes the adoption

and diffusion of a technology in markets with monopolistic competition, and

demonstrates how in a non-cooperative setting identical firms may adopt a new

technology at different dates. For non-identical firms he assigns a rank to the good of

each firm which alters the demand consumers have of it, and demonstrates how bigger

firms have a bigger incentive for adoption. This project proposes a variant of Götz’s

model in which consumers, and not only firms, are also faced with an adoption choice

(see Section 8).

Varian and Shapiro (ibid) and Varian et al. (2004) gives a business-oriented

overview on the state of the art in standards literature and creates many useful

classifications for technical innovations. His focus is on strategic maneuvering and in

discussing past cases, specifically Standards Wars such as VHS vs. Beta or Standard

Gauge vs. Broad Gauge in the early days of railroads.

A last, and quite recent, contribution to this line of inquiry is the manual

compiled by Farrell and Klemperer (2006). Although it focuses mainly in network

externalities study competition under switching costs and network effects. They show

how these effects can lock in customers into their early choices and how suboptimal

arrangements from a social welfare point of view can prevail under these conditions.

Literature on platform competition in two sided markets is also relevant when

studying technology adoption, in particular in competition between sponsored non-

19

compatible standards, although the bulk of it does not deal with it explicitly. This is

illustrated by the change in language, with platform being used more often than

technology or standard. Still models of two sided markets share many similarities with

models of technology adoption, as in essence both deal with network effects and the

incentives they generate. Rochet and Tirole (2002, 2006) have published several of their

works on the dynamics of competition in two sided markets, with a special focus on

pricing. In their models, platforms compete to attract a demand which is split on two

sides, with at least one of them experiencing a positive membership externality when

additional customers join the opposite side. They demonstrate that the profit-

maximizing decision of a monopolistic platform is to subsidize the side of the demand

that has higher price elasticity and overbill the inelastic side. Armstrong (2002)

develops his own model and pays special attention to competition between platforms.

He also allows the demand side to perform a broader range of behaviors, specifically he

allows those agents to multihome (i.e. use several platforms at the same time). In the

technology adoption context his model could be applied to

Empirical literature

A lot of the empirical literature on technology adoption has been done in within the

Marketing field (and a surprising amount of it focuses on adoption of electronic

payment systems), and tends to be based on attitudinal rather than hard data. There is a

lot of variety in the frameworks these researchers use, but most construct their

investigations around the Technology Acceptance Model (TAM). Proposed by Davis

(1989) TAM is applied mostly in research on the diffusion Information Technology, and

focuses on the perception users have of what he considers the two main drivers behind

an adoption decision: ease of use and usefulness. The model has undergone several

extensions and modifications and has been widely used since its introduction.

Some research has also been dedicated to the exploration of network effects using

data form other sources. For example, Rysman (2003) studies competition between

networks by studying how Yellow Pages directories compete. His final goal is to

determine whether or not standardization would be preferable to competition from a

social welfare point of view, since that would allow maximizing the magnitude of

network effects. His conclusion is that, in the specific case of Yellow Pages directories,

competition is preferred to standardization

20

As for technology adoption, in his paper Ohashi (2003) studies the competition

between Beta and VHS between 1978 and 1986, and the impact networks may have had

in the final victory of VHS. In order to identify the network effects he incorporates into

the consumers’ utility function an installed base variable for each of the competing

standards. He then proceeds by estimating adoption by using a nested logit model, in

which he first estimates the likelihood of adoption of any VCR device, and then the

likelihood of choosing either VHS or Beta. His model allows him to run simulations

with which he can contrast hypotheticals, and one of the most remarkable results he

obtains is that the success of VHS would have been unlikely had its price been lower

during the first stages of competition.

Along this line of work, Clements and Ohashi (2004) study the role of indirect

network effects on the videogame market in the United States. Videogame platforms are

an instance of sponsored non-compatible technological standards, and securing a broad

variety of software products early on in order to get a large installed base is one of the

main concerns of platform rivals. The paper goes on to model the strategic interactions

between platforms and software providers.

5. Econometric Model

The objective of the empirical part of this project is to determine the impact of direct

and indirect network effects on in the substitution of AVI by MP4 for audiovisual file-

sharing in TPB. In order to do so two models have been developed to explain the

variations in the share of adoption of MP4 files (one for each side of the BitTorrent

ecosystem).

Direct network effects are a consequence of adoption of MP4 by other users: as

explained before download speed for a specific file increases with the number of users

that have a copy of it, therefore increasing the attractiveness of a specific format.

However guessing which files each user is interested in is impossible with the data

accessible to us, so we decided to cluster our sample by using television shows: we

assume that someone that downloads an episode of a television show is more likely to

be interested in other episodes and he or she stands to benefit from the direct network

effect generated by larger shares in a specific format within that subset of files.

21

Furthermore, clustering by television show holds the additional advantage of allowing

us to follow a group of users and uploaders over time, since new episodes are uploaded

into TPB after they are broadcast through traditional television, so dummy variables for

unobserved demographic characteristics and time can be added.

As for the indirect network effect, we will model it is a function of the variety of

audiovisual media being offered in MP4. In order to parameterize this variety we will

use the lagged share of audiovisual media being offered in that format, or to put it

another way, the percentage of uploaded media files in MP4 up to that date.

With this in mind we define the following MP4 adoption share function

where is the MP4 adoption share among users for files uploaded at

date t for television show TV; is the format adoption share of users for

previous episodes of that television show; is the lagged proportion of media

files being offered in MP4, and and are television show and time dummies. We

decided to lag since given the large number of media files available for

download it’s unlikely that users would react immediately to variations in the format

composition of the aggregate total of files available.

Time dummies are included to account for unobserved changes in the utility of

MP4. As was discussed in Section 3 starting in the mid ‘00s MP4 becomes more

attractive thanks to it being compatible with HD and mobile devices. Lacking any way

to model these effects we include yearly time dummies as a means to capture the

increase in popularity the format had during the decade for reasons other than network

effects. Finally a dummy is also included for television shows, which is useful mainly to

cluster the sample in differentiated user groups and to a lesser extent to control for

unobserved demographic characteristics affecting the adoption decision.

The model estimating the adoption rate among uploaders of a specific television

show is similar, only in this case the both direct and indirect network effects come from

users. This is based on the assumption that uploaders don’t experiment any direct

benefit resulting from other uploaders switching to MP4 and instead benefit from it

indirectly through a ricochet effect: more uploads in MP4 mean more users in MP4

which in turn makes the format more attractive to uploaders. In order to capture it we

use the total installed base share for users at time t, taking into account all audiovisual

22

media. For the modeling of the direct network effect we proceed in the same fashion as

in the users’ case, taking into account only the user installed base within a single

television show. Thus we specify the following model for upload share

where is the share of installed base MP4 has among all users downloading

media files and , and are the same as in the user specification

(installed base share for a specific television show, television show dummies and time

dummies, respectively).

6. Data

The TPB database used was compiled by Karel Bilek4 by running a Perl script on the

TPB website. The whole process took “about six months” and according to his own

account between 100 and 300 Torrent files are missing (which is negligible when

compared to the more than 2 million Torrent files he did manage to compile). The data

was stored in an XML format and the uncompressed file weighs 4.4 Gb. For each

Torrent file the dump contained several fields, all of which were discarded except the

following:

Identification number: a unique identifier for each file

Title: the title of the file, often specifying the format as well in the case of media

Seeders: the number of users that have a full copy of the file and that are sharing

it

Upload date: the date in which the Torrent file was created

Information: comments left by the uploader which were also checked for file

format

User comments: comments left by downloaders of the file

Due to the large size of the database most of the cleaning and sampling was carried out

with the UNIX terminal. A first step consisted in creating a new XML field containing

for the number of comments each Torrent file had (which although available in the TPB

website was not included in Bilek’s dump) and removing the actual comments in order

4 Karel Bilek’s Github page: http://runn1ng.github.io/piratebay.html

http://runn1ng.github.io/piratebay.html

23

to make the file more manageable. The resulting dataset, which we will call “global

dataset” accounted for all 2 million observations. The next step was identifying the

format of each file: the format of many of them is not specified in the TPB website and

several formats are not relevant to this project (such as audio formats) and all those

instances had to be removed. At the end of the process files whose format was identified

as belonging to one of the big four were assigned to the “media dataset” and were used

to estimate the installed base among uploaders and users each format had. The media

dataset included 284’858 observations with upload dates ranging from April 2004 to

February 2013. A final dataset was created for all the Torrent files of episodes of 50 TV

Shows with original airdates in the time range covered by the media dataset. For a

television show to be selected it needed to have broadcasted at least two seasons in the

time period studied and each season needed to have at least 13 episodes (so miniseries

were excluded).5

Figure 9: Dataset creation process

The original metric that was to be used in order to assess the adoption rate of

MP4 among users was the number of seeders each file had. However the data didn’t

actually reflect how many users had downloaded the Torrent at any point in time,

instead it reflected the number of users that had a copy of it at the moment at which the

database was extracted: the field seeders contains transversal information corresponding

5 Samples of the code used during the cleanup process can be found in Appendix 1

Raw database

•Karel Bilek's extraction

•2.2 million files

Global dataset

•Replaced comments by comment count

•Eliminated useless fields

•2.2 million files

Media dataset

•Contains AVI, MP4, MKV and WMV files

•284k files

TV Shows dataset

•Files of episodes (50 TV shows)

•9k files

24

to the extraction date (between the end of 2012 and the beginning of 2013). As can be

seen in Figure 10, the average number of seeds per file declines very fast, with the

number of seeders for files 6 months old hardly representing a 10% of those for more

recent files, leaving many observations with either one or none seeders. This was

problematic because it increased greatly the volatility for the variable containing user

adoption share.

To tackle this, the workaround we have found to this is using the number of

comments users left on the page of each Torrent file as a proxy for its popularity: we

based this on the assumption that users were equally likely to leave comments on any

Torrent page, an assumption that was further reinforced when the sample homogenized

by restricting it to television shows. Unlike seeders, once a comment is left on a page it

remains there indefinitely since forum moderation in TPB is virtually nonexistent.

With the aggregate totals for file count and comment count for AVI and MP4

files the installed base shares for each format at each point in time and for each peer

group were calculated. The time unit used is months, since our intent was to set up a

panel and shorter time intervals increased a lot the noise and hurt significantly its

balance. Since we consider that adoption decisions are permanent in order to estimate

the user installed base we used the cumulative share of comments left in files belonging

to either format. Using cumulative values has the advantage of creating more robust

0

10

20

30

40

50

60

70

80

20

04

-04

20

04

-08

20

04

-12

20

05

-04

20

05

-08

20

05

-12

20

06

-04

20

06

-08

20

06

-12

20

07

-04

20

07

-08

20

07

-12

20

08

-04

20

08

-08

20

08

-12

20

09

-04

20

09

-08

20

09

-12

20

10

-04

20

10

-08

20

10

-12

20

11

-04

20

11

-08

20

11

-12

20

12

-04

20

12

-08

20

12

-12

Ave

rage

nu

mb

er o

f se

eder

s p

er f

ile

Fig. 10: Average number of seeders per file for MKV, WMV, AVI and MP4 files

25

variables and gives a more accurate estimate of the current installed base share. This

approach is easier to justify for uploads than for users, since in the former case it is an

actual reflection of the share of files available in each format at moment t. In the case of

users the approach is valid as long as we only use the share and not the absolute values,

and as long as the adoption decision is not reversible.

For users the installed base share at moment t was calculated by using the Media

Dataset as

∑

∑

where are the count of comments left in MP4 files uploaded at moment i, and

is the comment count of media files in any of the big four video formats

uploaded at moment i.

For uploaders the installed base share at moment t was calculated by using the

Media Dataset as

∑

∑

where are the count of files uploaded in MP4 at moment i and is

the count of media files in any of the big four video formats uploaded at moment i.

Restricting the sample to specific television shows holds many advantages. First

it facilitated substantially the task of cleaning up the original database extraction,

particularly in the case of MP4 which is used to contain both audio and video files, so

after filtering the original extraction by file format a significant proportion of the MP4

files that remained were audio files (mostly music records). Even though their presence

was relevant to the study, since it was indicative of the degree of penetration of the

format among users and uploaders and was used to assess the indirect network effects, it

does not compete against video formats but against audio formats (such as MP3 or

FLAC), so they were distorting the calculation of the variable we used to calculate the

direct effects in video files.

The second advantage of restricting the sample to specific television shows is

that it minimizes any time biases (for instance, older Torrent files having more

26

comments because they have been up for a longer period of time). All the shows

selected for the sample were airing new episodes at some point during the time interval

studied, and, as the sharp drop in the number of seeders Figure 11 shows, most activity

registered in a TPB page happens soon after the file’s upload date.

A quick survey of the most active uploaders6 shows that the Torrent file of a new

episode is uploaded within the next 24 hours to its its original broadcast. These peaks of

activity attenuate (without completely eliminating) the time bias observed for user

comments since, as the sharp drop in the number of seeders in Figure 8 demonstrates,

most of the user downloads occur soon after the upload moment. While more reliable

than seeders, the comment count still presents a bias, since files that have been up

longer tend to have a higher number of comments in them. This is a minor problem

since for the same reason older files are also likely to have more downloads, if only

because they have been up longer. Using proportions instead of absolute values should

minimize this bias.

6 An example of an episode upload catalog can be found at www.eztv.it

0

50

100

150

200

250

300

350

400

2004

-04

2004

-12

2005

-04

2005

-09

2006

-01

2006

-06

2006

-10

2007

-02

2007

-06

2007

-10

2008

-02

2008

-06

2008

-10

2009

-02

2009

-06

2009

-10

2010

-02

2010

-06

2010

-10

2011

-02

2011

-06

2011

-10

2012

-02

2012

-06

2012

-10

2013

-02

Ave

rage

nu

mb

er o

f se

eder

s

Figure 11: Average number of seeders per file (TV Shows sample)

http://www.eztv.it/

27

A third of the many advantages of restricting the sample to TV Shows is that it

allows controlling for variations that stem from the characteristics of the viewership of a

specific show. Furthermore, since those viewers follow the show over a period of time,

using the TV Shows dataset enables the possibility of following the evolution of their

preferences over a period of time.

The share for user adoption of MP4 during month t and television show TV was

calculated by using the TV Shows Dataset as

where is the count of comments in MP4 files uploaded in month t for

television show TV and is the comment count for files in any of the

big four media formats uploaded at moment t for television show TV.

The share for uploader adoption was estimated in a similar way (also with the

TV Shows Dataset) with

where is the percentage of uploads in MP4 during month t of

television show TV and is the count of uploads in MP4 in month t for

0

1

2

3

4

5

6

7

8

9

10

20

04

-10

2005

-02

20

05

-07

20

05

-11

20

06

-03

20

06

-08

20

06

-12

20

07

-04

20

07

-08

20

07

-12

20

08

-04

20

08

-08

20

08

-12

20

09

-04

20

09

-08

20

09

-12

20

10

-04

20

10

-08

20

10

-12

20

11

-04

20

11

-08

20

11

-12

20

12

-04

20

12

-08

20

12

-12

Ave

rage

nu

mb

er o

f co

mm

ents

Figure 12: Average of comments per uploaded file (TV Shows sample)

28

television show TV and is the count of files in any of the big four

media formats uploaded at moment t for television show TV.

Finally, to calculate the installed base of users of a specific television show we

used the following way

∑

∑

where are the comment count left in MP4 files belonging to television show

TV at moment i, and is the comment count left in all media files

belonging to television show TV at moment i. t-j is the moment corresponding to the

observation that precedes chronologically observation t.

Equivalent variables were generated for all other big four formats. Even though

this section describes only MP4 variables, the variables for AVI, MKV and WMV were

calculated following exactly the same procedure. Table 1 contains the descriptive

statistics for all of them.

29

Table 1: Descriptive statistics

Variable Obs Mean Std. Dev. Min Max

MP4

UserShare 1295 0.1842 0.3261 0.0000 1.0000

UploadShare 1541 0.1602 0.3722 0.0000 1.0000

IBUp (lagged) 1541 0.2596 0.1263 0.0000 0.5586

IBUs (lagged) 1541 0.1924 0.0854 0.0000 0.4262

TVIBUs 1541 0.1358 0.1457 0.0000 0.8333

AVI

UserShare 1295 0.6649 0.4007 0.0000 1.0000

UploadShare 1541 0.6582 0.3722 0.0000 1.0000

IBUp (lagged) 1541 0.4163 0.2019 0.1199 1.0000

IBUs (lagged) 1541 0.4845 0.1684 0.1529 1.0000

TVIBUs 1541 0.7595 0.2190 0.0000 1.0000

MKV

UserShare 1295 0.1406 0.2687 0.0000 1.0000

UploadShare 1541 0.1667 0.2699 0.0000 1.0000

IBUp (lagged) 1541 0.2254 0.1339 0.0000 0.5274

IBUs (lagged) 1541 0.1662 0.0944 0.0000 0.3323

TVIBUs 1541 0.0923 0.1239 0.0000 1.0000

WMV

UserShare 1295 0.0103 0.0920 0.0000 1.0000

UploadShare 1541 0.0146 0.1061 0.0000 1.0000

IBUp (lagged) 1541 0.0988 0.0492 0.0000 0.3612

IBUs (lagged) 1541 0.1569 0.0476 0.0000 0.4071

TVIBUs 1541 0.0124 0.0696 0.0000 1.0000

30

7. Results

Results for user share

Before proceeding to the actual estimation we ran a Hausman test for each user

specification in order to determine whether or not it was suitable to proceed using a

random effects model or if instead we should use a fixed effects one. The result7 was

that we were not able to discard the null hypothesis of equal efficiency of the estimators

for the random effects model, so we carried out our estimates using it. Additionally this

allowed us to perform our regressions using Maximum Likelihood Estimation, instead

of Ordinary Least Squares, which is advisable considering that the relation our

independent variables share with the dependent one may not be linear.

A summary of the results obtained for the regression of MP4 share for users can

be found in Table 2. Both the direct (TVIBUs) and indirect (IBUp) network effects

variables are significant with a 99% and 95% confidence respectively. The indirect

effect seems to have a stronger effect on the adoption of MP4 than the direct effect, that

is, the variety of media available in a specific format has a strong effect on the decision

to switch to MP4. This makes sense since it is very likely that users don’t restrict their

downloads to episodes of a single show, and that the availability of other media in a

specific format conditions their choice. Evidence of a direct network effect was also

significant, if lower than in the case of the indirect network effect, with monthly

adoption of MP4 increasing along with the share of users of a specific television show.

This is likely the consequence of restricting the direct network effect to the television

show level, which we did in order to ensure that the direct and indirect network effects

were properly distinguished. Even so, the effect has the sign we expected and confirms

that format adoption by the users affiliated to a television show do have an impact on

future adoption decisions. The yearly dummies we included are not significant,

probably the random effects model already accounts for some of the variation in time,

and that detracts intensity and significance to our dummies. We chose to leave them

because when running the same model for the share of adoption of AVI some

interesting effects were revealed.

7 Stata results for the Hausman tests are available in Appendix 3.

31

Table 2: Random Effects MLE regression of MP4 share for users with year dummies

Group variable: show

Observations

1255

Pseudo R-sq 0.1996

Groups

50

Log-likelihood -291.935

Obs per group: min = 3

LR Chi2 145.65

avg = 22.8

Prob > Chi2 0.0000

max = 65

ShareUsers

(MP4) Coef. Std. Err. t P>t [95% Conf. Interval]

IBUp .4122 .1983 2.08 0.038 .0234 .8010

TVIBUs .2704 .0892 3.03 0.002 .0956 .4553

d2005 .0732 .2217 0.33 0.741 -.3613 .5078

d2006 .0305 .2194 0.14 0.889 -.3995 .4605

d2007 .0726 .2195 0.33 0.741 -.3577 .5030

d2008 .0965 .2197 0.44 0.660 -.3342 .5273

d2009 -.0099 .2232 -0.04 0.964 -.4474 .4275

d2010 .0897 .2240 0.40 0.689 -.3493 .5288

d2011 -.0206 .2242 -0.09 0.926 -.4602 .4188

d2012 .1736 .2164 -0.74 0.460 -.2872 .6344

Cons -0.221 .2164 -0.10 0.918 -.4463 .4020

Table 3 shows the results of applying the same user adoption model to the AVI

user adoption share (with the AVI variables). Interestingly, the adoption of this format

also shows statistically significant direct network effects and indirect network effects

(with an 80% confidence). The indirect network effect is weaker than in the case of

MP4, and a possible reason for that is However the most interesting thing in this

regression are the year dummies for two reasons: first they show a higher degree of

significance overall than in the case of MP4, and it increases as we approach 2012, and

secondly all the estimated coefficients are negative. While in the case of MP4 the

network effects dominated the adoption process in the case of AVI there are two forces

at work: one is the loss in usefulness the format experienced during the period studied

(captured by the yearly dummies), which drives users away from it, while the other is

the network effects the initial widespread adoption of the format is generating, which

attracts users. As we saw in Section 4 the repulsive force ended up overcoming the

attracting one, but without the former it is likely that the weaker intensity of the indirect

network effect might not have sufficed to move the user base of TPB into MP4,

becoming locked-into AVI. This result points to the relevance of external adoption

32

drivers in the process of technological replacement with network effects, something that

will be examined further in Section 8.

Table 3: Random Effects MLE regression of AVI share for users with year dummies


Observations

1255

Pseudo R-sq 0.2025

Groups

50



LR Chi2 235.65

avg = 22.8

Prob > Chi2 0.0000

max = 65

ShareUsers

(AVI) Coef. Std. Err. t P>t [95% Conf. Interval]

IBUp .2105 .1496 1.41 0.159 -.0827 .5038

TVIBUs .2947 .0709 4.15 0.000 .1556 .4339

d2005 -.0836 .2535 -0.33 0.742 -.5806 .4133

d2006 -.0368 .2506 -0.15 0.883 -.5281 .4544

d2007 -.1682 .2507 -0.67 0.502 -.6597 .3231

d2008 -.1735 .2502 -0.69 0.488 -.6639 .3168

d2009 -.1198 .2538 -0.47 0.637 -.6174 .3777

d2010 -.1883 .2586 -0.73 0.466 -.6952 .3185

d2011 -.1410 .2595 -0.54 0.587 -.6498 .3676

d2012 -.4311 .2635 -1.64 0.102 -.9476 .0853

cons .5532 .2802 1.97 0.048 .0039 1.1025

Results for uploader share

Before running the regression we also performed a Hausman test on the specification,

which confirmed that the random effects model was efficient as well.

Table 4 shows the results of the regression, and confirms the presence of an

indirect network effect in the monthly uploader share. As was discussed in Section 6 the

variable IBUs is capturing the impact an increase in the installed base of users has in the

dependent variable, and there seems to be evidence of an uploader response to the

change in the preferences of media-consuming users. However the direct network effect

doesn’t seem to have any significance, which is understandable considering that users of

a specific television show are only a small subset of the total user base of TPB.

Encoding and uploading a Torrent file is not a trivial task, and acquiring the technical

proficiency needed to do so efficiently entail an investment in terms of learning costs

several orders of magnitude larger than the one users (who only have to download or

update their media player) have to assume when they decide to switch formats.

33

Furthermore, the process of uploading files into the BitTorrent network has economies

of scale, since the method used can be automated and ported to all kinds of media,

regardless of their content. All of this, together with the fact that the final goal of an

uploader is to maximize downloads, explains the weak role direct network effects have

in this instance: switching to a new format is only worth it if all the media that a specific

uploader is going to release is going to be in the new one, and switching to MP4 to cater

exclusively to a small subset of users is just not worth it.

Table 4: Random Effects regression MLE of MP4 share for uploaders with year dummies


Observations

1491

Pseudo R-sq 0.1207

Groups

50



LR Chi2 49.52

avg = 29.8

Prob > Chi2 0.0000

max = 70

Share Uploaders

(MP4) Coef. Std. Err. t P>t [95% Conf. Interval]

IBUs .9075 .1694 5.36 0.000 .5754 1.2397

TVIBUs .0113 .0641 0.18 0.859 -.1143 .1370

d2005 .2988 .2001 1.49 0.135 -0.0934 .6911

d2006 .1181 .0633 1.86 0.063 -.0066 .2429

d2007 .1818 .0539 3.37 0.010 .0795 .2876

d2008 .1340 .0468 2.86 0.004 .0421 .2258

d2009 .0638 .0396 1.61 0.107 -.0138 .1416

d2010 .1127 .0358 3.15 0.002 .0425 .1829

d2011 .0873 .0304 2.87 0.004 .0276 .1470

d2012 .0508 .0320 1.59 0.112 -.0119 .1135

Cons -.0960 .0593 -1.62 0.105 -.2123 .0219

Another interesting result of the uploaders’ specification is the statistical

significance of the year dummies we included. This suggests that while network effects

were the main factor leading to MP4 adoption among users, the switch in the case of

uploaders was strongly influenced by the external factors that made the format more

attractive over the time period studied. It is also indicative of a more active role of the

uploaders’ side in the adoption process of MP4, since the group’s behavior is consistent

with the changes in the technical landscape that ended up deciding the adoption of the

format.

34

8. Theoretical model

In this section we propose a microeconomic model to explain some of the results

observed in our empirical study. In order to allow its generalization to other contexts

different than the file format in TPB we have included production costs and prices,

which would be roughly equivalent to download time or the cost of encoding and

uploading different file formats. We also added adoption costs to this model, which

were omitted from the empirical part due to a lack of data. Our model is set in a

monopolistic competition market, because of the high degree of differentiation between

the television shows that were analyzed in the empirical part.

The model shares some similarities with the one developed by Götz (ibid),

which also describes a monopolistic competition market in which firms adopt in order

to reduce their marginal cost and increase their benefits via demand volume. However

the equilibrium for Götz’s model is always full adoption by all firms, which is not what

we observed in our empirical study and is not what happens in general in the real world.

So in this section we strive to create a model that reflects those incomplete penetration

equilibriums and that also takes into account the adoption decisions made by

consumers.

Basic setting

The starting point is identical to the one postulated by Götz in his model: the industry is

also composed by a number n of active firms which produce a differentiated product,

and a number of consumers E. One of the first departure points is that unlike Götz we

assume that consumers are myopic and therefore only maximize for the present moment

in time. Each has preferences described by a Dixit-Stiglitz consumption index with

(∫

)

[1]

where y(j,t) is the amount of variety j demanded by a consumer at time t. We will

assume all consumers have the same income which we normalize at 1, and that there is

no numeraire available to spend money in. As only one unit of income is spent on the

goods total expenditure in the market equals E. Therefore firm j in moment t is facing

the aggregate demand

35

∫

[2]

where p(j,t) is the price of variety j at moment t. As Götz does in his paper the term in

the denominator will be called the “price index”, and the actions of rivals impact the

demand firm j faces through it. Another thing worth noticing is that the demand a single

consumer has of good j at moment t is the same expression but without multiplying by

E. For a marginal cost c the profit maximizing price firms set is

Initially firms have a common and constant marginal cost , which they can

reduce by adopting a new technology. This switches their costs to , with

[3]

where is a constant that reflects the efficiency gain. Here our model starts to depart

widely from the one postulated by Götz, since we introduce the additional assumption

that consumers also need to adopt the technology in order to be able to purchase goods

from one of the firms using the new technology. If a consumer is not an adopter he will

only be able to purchase goods from firms that are producing with the old technology.

However we assume that the technical innovation offers backwards compatibility, so

that a consumer that adopted the technology will still be able to purchase goods from

firms that didn’t adopt. Table 5 summarizes the possible interactions.

Table 5: Possible interactions between firms and consumers

Firm technology

Consu

mer

tech

nolo

gy

Adopts Doesn’t adopt

Adopts Purchases at price

Purchases at price

Doesn’t adopt Doesn’t purchase

Purchases at price

36

Additionally we define variables to measure adoption shares in each side of the

market: for that purpose let q and s be the adoption rates of the technology among firms

and consumers, respectively.

The final difference with the Götz model is that consumers and firms are not

identical: each has an adoption cost

, with { }, a sunk cost in

which they need to incur in order to adopt the technical innovation. These can be

interpreted in terms of learning costs or monetary costs that the agent needs to assume

in order to get up and running with the new technology. We will assume that

is

identically and independently distributed for both groups, with maximum values .

Technology adoption by consumers

Since the market is fragmented the price index of a consumer changes depending

on whether or not he or she has adopted the technology. Consumers that have done so

will be able to purchase goods from firms in the [0 – n] interval, so their price index can

be written as

∫

[

] [4]

On the other hand, consumers that have not done so will only be able to

purchase from firms which have not adopted the new technology, which will be those

located in the [n – qn] interval. Their price index will be

∫

[5]

Given these assumptions and using [1] and [2] we can describe the value of the

Dixit-Stiglitz index for a consumer that has adopted the technology as

(∫

)

(∫ (

∫

)

)

[6]

For a consumer that has not adopted the technology we have the following

utility:

37

(∫

)

(∫ (

∫

)

)

[7]

All consumers with an entry cost below will take on the technology if the

difference between their post and pre adoption utility levels is higher than .

[8]

Having the costs distributed iid has the property that threshold adoption costs at

moment t can be used to estimate the adoption share a technology has. If a consumer at

with adoption costs decides to adopt he technology, it must be that all the consumers

with adoption costs below that level have already adopted or will chose to do so as well.

This way the adoption share for consumers can be expressed as

⇔ [9]

By plugging [4] and [5] into [6] and [7] (respectively) and then substituting in

equation [8] we get the following function for threshold adoption costs8

{[

]

}

Finally, by substituting by the expression in [3] and by the one in [9] and

then rearranging we obtain the user adoption share as a function of all the other

variables

[

] [10]

Equation [10] has several interesting elements. First we can see how the

adoption share among users is positively related to the total number of companies

operating in the market. On one hand this reflects the preference for variety consumers

have, and the unwillingness to abandon consumption of a portion of the available

8 Some steps of the process are included in Appendix 4

38

products in order adopt the new technology, and the bigger the market is on the supply

side, the less variety they are giving up when finally adopting. On the other hand this is

due to the fact that for a higher number of firms means that a specific proportion of

adopters entails a bigger increase purchased volume (and therefore utility) than it would

if the number of total firms was lower. The second interesting element is the negative

value of the expression between the square brackets when , which is interpreted

as a null adoption share among users. As we will soon see firms have a similar bias,

pointing out to the need to have an installed base on day zero for a technology to be able

to break into a market. In the empirical case we studied MP4 had such an installed base,

since some music and HD contents had to be offered in the wrapper due to the

limitations of AVI. Without these external factors, it’s unlikely that the format would

have enjoyed the success it finally ended up having. Also of note is the negative

relationship the share has with the price of non-technological goods : intuition

dictates that if the price of the non-technological firm is very high an innovation that

reduces it would be a welcome development, and would be more desirable than in a

case where the price was already low. However, we need to keep in mind that the

decrease in the price is proportional, that is, higher initial prices mean higher prices after

adoption as well. This, together with the fact that after purchasing the technology the

consumer won’t stop consuming the non-technological goods (love of variety) ads up to

a lower relative increase in the utility the consumer has. The negative influence of is

easier to explain, since a higher value for it means that adoption costs are spread over a

larger interval: an increase of it means, all else equal, a decrease in the proportion of

users that have a utility differential higher than

Finally we have to explain what the expression between the square brackets is.

The expression is derived from the adopters’ utility

function, and is the term that captures the “share effect” technology adoption has in

terms of utility. The marginal utility of firms is replaced by the new marginal utility

with lower prices . On the other hand, is the loss of utility

that comes from not adopting and being unable to purchase from firms that are

producing with the new technology. In the first case, the impact bigger shares have is

positive, increasing adopter utility and therefore the adoption rate while in the second,

bigger shares have a negative effect. We refer to the result of subtracting the adopters’

share effect from the non-adopters’ as “net share effect”.

39

Figure 13 shows how the net share effect changes when we vary and for a constant

level of .

Technology adoption by firms

To model the adoption decision on the firms’ side we will use a similar method as with

the consumers, only in this case instead of utility we will use benefit which we define as

( )

Here the benefit will also need to be calculated separately for adopting and non-

adopting firms, since the former are servicing only the subset of technology adopters sE

(recall that E was the total number of customers in the market). We obtain the benefit

function or firms that adopt the technology by applying [2] and substituting the prices

[(

) ] [

∫

] [12]

Recall how s was the share of consumers that have adopted the technology, and

how that’s the only group that an adopter firm will service. Notice however how the

price index used is for all the consumers: that’s because adopting consumers also

purchase goods from non-adopting firms. The benefit for non-adopting firms is

0

0.2

0.4

0.6

0.8

1

1.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Val

ue

of

ne

t sh

are

eff

ect

Value of q

Figure 13: Net share effects for β = 0.9

α=0.1

α=0.2

α=0.3

α=0.4

α=0.5

α=0.6

α=0.7

α=0.8

α=0.9

40

[(

) ] [

∫

] [13]

Firms will adopt the technology if

[14]

Substituting [12] and [13] in [14], and developing we obtain

( )

[ ] ( )

which can be rearranged into

( )

[ ]

[15]

Function [15] determines the threshold adoption cost for firms as a function of

the other variables. The most striking feature of this solution is that the decision

adoption of firms doesn’t depend on the initial marginal cost: this was expected since

under this kind of monopolistic competition the markup of firms is

If we take the derivative of the markup with respect to the cost is always positive

and constant, which means that the markup decreases at a constant rate along with the

costs. Given this the only relevant variable to firms is the variation in the marginal cost

and not the cost level itself.

When we derived the adoption function for consumers we saw how their

decision was conditioned only by the number of adopting firms, and the proportion of

consumers that had adopted was irrelevant to them. However this is not so in the case of

firms, since the decisions of other agents in their side of the market does impact their

benefit function and this is denoted by the appearance of in their adoption function.

41

Since and , , which means that the impact has on the

adoption threshold is negative which in turn means that the higher the share of firms

that adopt, the higher the incentive non-adopting ones will have to do so as well. The

average number of consumers per company E/n has a positive relationship with the

threshold adoption costs (which means a positive relationship with the share of adopting

firms). This result is related to consumers’ love of variety: in more competitive markets

the increase in sales volume doesn’t make up for the loss of revenue that comes from

servicing a smaller share of the market. This result can be related to a market’s maturity

level: more established ones will be more likely to become locked into the older

technology, while firms operating in more concentrated markets will stand to gain more

from adoption, since they will experience a higher increase in demand when they lower

prices. There is an exception to this point when the market is operating under a

monopoly, in which case the firm will never adopt the technology because its income

does not vary.

Finally the fraction in the left side of equation [15] we can find a fraction: the

denominator should be familiar since it is similar to the expression we found for the

consumers’ adopter share effect, only in this case it is not reflecting the increase in

utility of adoption but the variation in the price index as a function of the proportion of

adopting firms. It is dividing ( ), an expression indicative of the increase

in income adopting the technology entails. is a measure of the variation in

the firm’s income after adopting the new technology, taking into account the share of

adopting consumers as well as the effect a reduction in the price has on the demanded

volume. If the firm is making less by adopting the technology than by

not doing so, and this means a negative value for , which we interpret as no adoption

on the firms’ side. This ratio can be interpreted as the increase in income derived from

adoption corrected by the proportion of adopting firms. The more firms that chose to

adopt the lower its value will be since more firms will be already competing at lower

prices. On the other hand the value of the ratio increases together with the proportion of

adopting consumers, since higher adoption rates on the consumer side entail more

demand for the technological goods.

42

Fit with empirical evidence

The microeconomic model greatly emphasizes the need of having a day zero installed

base for an evolutionary innovation to be able to succeed. This was also the case with

MP4, where external factors such as the advent of media centers and portable hardware

greatly incentivized adoption, providing an installed base for the successful diffusion of

the format. Furthermore, the introduction of adoption costs also seems to reflect reality,

providing an explanation for the gradual acceptance and incomplete penetration of the

format. The model also seems to capture the cascading and self-reinforcing relation

between adoption share for firms and for consumers: higher proportions in one side

reinforce adoption in the other, while at least in the supply side, actions taken on the

same side impact adoption differently.

Finally, even though it was not explicitly addressed in the empirical part, another

way in which our theoretical model parallels the format adoption dynamics in TPB is in

the importance the maturity of a market has in the success or failure of the new

technology: the user base for TPB increased during all the period studied, which may

have facilitated the transition from AVI to MP4. In the same line, it captures the

incentive some of the firms have to develop a niche market and keep offering the non-

technological good despite the migration of other competitors to the new technology. A

high substitution elasticity increases this effect and makes adoption less attractive since

the actual quantities consumed barely be affected by changes in the prices, and an

income effect (coming from the price reduction of a part of the goods traded) increases

quantities consumed in general.

As mentioned before the most important departure point of the model with respect

to the empirical part is in the inclusion of prices for the goods, which translate into a

level of demanded volume for each good. However the improvements MP4 offers don’t

translate necessarily into a quantitative increase of demand but in a qualitative

improvement of the goods purchased. A far-fetched way of looking at it is in terms of

bandwidth (which would be equivalent to income) and some kind of quality/size ratio

(which would be equivalent to price). More elegant modifications to the model that take

this nuances into account are examined in the discussion section.

43

9. Simulation

In order to better understand the dynamics of the model we ran several simulations

varying some of the parameters described in Section 8. The software we used for it was

Netlogo, an application developed by Northwestern University that facilitates the

creation and execution of agent-based models. In order to implement our model we

created two sets of agents, consumers and firms. We kept all the parameters for

consumers constant across all the simulation, varying only the ones on firm side and

global. A total of 5’000 consumers were created, and each of them was assigned an

adoption cost which was a value in the [0 - 100] range, with all values independently

and identically distributed amongst the population. The number of created firms

varied from simulation to simulation in the [50-500] range. Their adoption costs were

set in the range [0 - fkmax], where fkmax is a parameter whose value was also varied

in different simulations. In order to solve the initial conditions problem the

microeconomic model presents (i.e. that no firm will adopt if no consumers have

adopted, and that no consumers will adopt if no firm has adopted) a proportion ibc of

consumers were made into adopters at random before the simulation started. Tables 6

and 7 describe the parameters and values used in the simulation.

Table 6: List of parameters used in the simulation

Parameter Name Explanation Value

cons Number of consumers (E) 5’000

n Number of firms (n) [50 – 500]

alpha Substitution elasticity ( [0.1 – 0.9]

beta Marginal cost reduction [0.1 – 0.9]

ckmax Maximum value for consumer

adoption costs ( 100

fkmax Maximum value for firm

adoption costs ( [50 – 500]

ibc Initial adoption share of

consumers [0 – 0.9]

cost Marginal cost of production [100 – 500]

44

Table 7: List of variables used in the simulation

Variable Name Explanation

s Adoption share of consumers

q Adoption Share of firms

kc Threshold adoption cost for consumers

kf Threshold adoption cost for firms

ConsAdoptionCost Adoption cost for individual consumers (owned)

FirmAdoptionCost Adoption cost for individual firms (owned)

The following pseudocode describes the routine executed during each simulation9:

create consumers, assign them ConsAdoptionCost

create firms, assign them FirmAdoptionCost

make a proportion ibc of consumers adopt the technology

while there is no equilibrium

calculate q, s, kc, kf

consumers with ConsAdoptionCost < kc adopt the technology

firms with FirmAdoptionCost < kf adopt the technology

check for equilibrium

loop

When adoption shares for both parts stop varying from cycle to cycle the system

enters a stationary state and the equilibrium condition is considered fulfilled, which in

turn causes the simulation to stop and to return the equilibrium value of the variables as

well as the value of the parameters used in that iteration.

Simulation results

A total of 12’000 simulations were ran, varying the parameters as specified on Table 6.

On the firm side it was immediately evident that the main determinants of adoption

were the values of alpha and beta: high values of beta are a disincentive to adoption,

since as we saw on the previous section they have a direct impact on the firms’ markup.

On the other hand higher values for alpha exert an opposite force and lead to more

9 The actual code used can be found in Appendix 5

45

adoption. Figure 14 shows firm adoption shares for different equilibrium states varying

only alpha and beta.

The rest of parameters also had an impact on the equilibrium adoption shares,

although they condition adoption levels rather than the binary success or failure of a

new technology in getting a foothold in a market. As can be appreciated in Figure 15

fkmax has a negative impact on the adoption share, while the impact of n works

opposite of that, resulting in higher equilibrium adoption shares for lower values of it.

0

0.2

0.4

0.6

0.8

1

1.2

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Equ

ilib

riu

m f

irm

ad

op

tio

n s

har

e

Alpha

Figure 14: Equilibrium adoption shares for firms with fkmax = 250 , n = 500, cost = 250 and ibc = 0.3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.2

0.4

0.6

0.8

1

1.2

100 200 300 400 500

Equ

ilib

riu

m f

irm

ad

op

tio

n s

har

e

fkmax

Figure 15: Equilibrium adoption shares for firms with alpha = 0.75, beta = 0.25, cost = 20, and ibc = 0.3

50

100

200

300

400

500

n

Beta

46

On the consumers’ side the effect of beta on the adoption share is the opposite

as in the firms’ case: a higher reduction in marginal cost translates into lower prices for

consumers, increasing the incentive they have to adopt. On the other hand, lower values

for alpha also favor technology adoption, since a lower substitution elasticity allows

consumers to take full advantage of the decrease in prices for a part of the goods offered

without sacrificing much welfare for the loss of diversity in their consumption basket. It

should be noted that consumers’ adoption decisions are not independent for the firms’:

this is especially noticeable when we look at the equilibrium shares for consumers when

we vary alpha and beta, leaving the rest of parameters constant.

It quickly becomes apparent that if there is no adoption on the firms’ side there

will not be further adoption among consumers either. This becomes clear when we vary

the values of beta: higher values are attractive to consumers and unattractive to firms,

and when we increase its values we see how the effect on the firms’ side dominates the

other one, as evidenced by the lower values for equilibrium shares of consumers. Figure

16 shows the consumer equilibrium shares for the market that was described in Figure

14 (varying also alpha and beta).

Finally, another set of relevant parameters to consumers are the initial marginal

cost and the number of firms operating in the market. Figure 17 shows how the

equilibrium adoption share varies when we change them leaving all the remaining

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Equ

ilib

riu

m c

on

sum

er a

do

pti

on

sh

are

Alpha

Figure 16: Equilibrium adoption shares for consumers with fkmax = 250, n=500, cost = 250 and ibc = 0.3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Beta

47

parameters constant. As was explained in the previous section, a higher marginal cost

translates into higher prices for consumers which difficult the investment necessary to

adopt the new technology. On the other hand, a higher number of companies constitutes

an incentive to adoption since even a small proportion of adopting firms can mean a

great variety of products being offered at a lower price, which encourages consumer

adoption. This figure can be compared to Figure 15, since both show equilibria on

different sides for identical markets. Notice how even if the penetration of the

technology is incomplete on the firm side, it can be a sufficient incentive to encourage

adoption by all consumers.

10. Discussion

In the empirical part of this project (Sections 6 and 7) we demonstrated the impact

network effects have had in the substitution of AVI by MP4 as the dominant format for

containing audiovisual files in TPB. We found significant coefficients for both forces in

the case of peers that download and for the indirect effect in the case uploaders. Even

though the pseudo-R-squared coefficients we ended up obtaining were modest, they are

sufficient when one considers that our models only accounted for network effects and

not for market conditions, which probably explain a lot of the variance of the sample.

Another factor that might have moderated the determination coefficients is the noise

0

0.2

0.4

0.6

0.8

1

25 75 125 175 225 275 325 375 425 475

Equ

ilib

riu

m c

on

sum

er a

do

pti

on

sh

ares

Cost

Figure 17: Equilibrium adoption shares for consumers with fkmax = 250 , alpha = 0.75, beta = 0.25 and ibc = 0.3

100

200

300

400

500

n

48

present in the sample: due to the technical restrictions, only the files that unambiguously

belonged to a format and television show were selected, a process during which many

observations of whose content or format we were not certain were discarded. If we had

used a less basic algorithm and had gotten into content disambiguation or even used the

magnet links provided in the dump to directly check the extension of each Torrent file,

we would have ended up with a more balanced panel and the results of our analysis

could have been more reliable overall. Carrying out this kind of techniques requires a

time investment and a level of technical expertise which was not available to us when

developing this project and are, all in all, not its purpose.

A possible improvement for the empirical part is related to the data generation

itself. Carrying out our own extraction while taking into account the empirical goals of

our research would enable us to obtain additional fields that were not collected in the

raw database we used for this project. The one containing the name or alias of the

uploader would have been of particular interest to us, since it would provide us a way of

controlling for distinct uploaders across different television shows. Since the author of

the database did upload some of the code he used doing something along those lines

may be feasible in the future. Obtaining additional geographical data of all peers would

also be a useful development: controlling for geographical factors would allow us to

incorporate variables that account for the popularization of the format resulting from the

generalization in the use of mobile devices or HD screens, since this were introduced at

different points in time for different countries. Doing so would allow creating

something akin to a treatment variable, further improving the reliability of the model.

However a regular dump of the TPB website would not suffice garner that information,

since it is not openly available to the public. In order to get access to it the collaboration

of the site administrators would be essential, and given the ambiguous legal status of

some of the files shared it is unlikely that they would be willing to give us access to the

information.

Another interesting way of developing the empirical model would be to set up a

discrete choice one similar to the one used by Ohashi (2004) for studying standard

competition in the VCR market. This approach separates more elegantly the network

effects from the market conditions which push a user to choose either format. In order to

implement such approach additional information would be needed, and it would be

useful to focus in a single side of the BitTorrent ecosystem. A nested logit model such

49

as the one developed by Ohashi would probably be unnecessary, but that still leaves the

construction of the model for estimating the utility function, for which we would need

much more information on both the users and the contents being studied.

A first step towards that end could be diving deeper into the demographic

characteristics of the audience of each television show. If we had information on age,

income or education of spectators additional variables could have been included to

model in a more accurate way the trends that, throughout the decade, made MP4 more

attractive than AVI. Ratings companies such as Nielsen, Google, IMDB or Yahoo have

data which might have been used towards that end, however it is proprietary and they

only make it available in exchange of large fees. On the other hand broadcast companies

do offer some free information to that effect, although it’s often through press releases

that only contain partial data.

Including sales data of devices compatible with MP4 such as PlayStation 3 or HD

television sets is another way of giving more accuracy to the time effects and having

more reliable estimates for the magnitude the network effects. Sales figures are an

attractive metric to model the increase in utility the format experimented throughout the

2000s. Besides different introduction dates an additional complication to this approach

is that these platforms may have enjoyed different degrees of popularity in different

countries, and lacking a way of segregating our TPB observations along geographic

lines complicates the identification of hardware effects.

With the same dataset we could also have studied similar format competition in

music instead of audiovisual media. Looking at how MP4 has performed against other

audio container formats such as MP3 or FLAC should turn similar results as the ones

we observed for video container formats. However the approach we used of clustering

our observations by television shows would not be valid in this instance, since

downloads for music files have a longer peak cycle. A way to proceed would be using

Billboard charts as a way of measuring the popularity of an artist at a given point in

time and looking at uploads and downloads of his works taking that into account. Video

codec battles, such as DivX versus x264 could also be studied, although sampling for

them would be complicated given that x264 files are more likely to be identified as

such, since the codec is often used as an alternative way of publicizing a video file as

high definition.

50

As for the microeconomic model, a possible way of furthering its development

would be to allow for more differentiation among agents. A way of achieving that goal

among consumers would be to do away with their homogenous income, which would

probably present us with interesting dynamics regarding income distribution and

technology adoption. Withdrawing the obligation they have of spending all their income

in the goods being offered in the monopolistic market is another possibility that would

allow us to have a numeraire and therefore be able to measure with more reliability the

variation in utility adopting a technology entails. A final possibility on the consumer

side would be the inclusion of a utility bonus tied to technology, which would be

proportional to the quantity consumed of an adopting firm. On the firms side allowing

for variable costs is also a way of including diversity and making the model more

realistic. Alternatively a rank constant could be assigned to each company in order to

obtain different level of consumption for each company.

This project was carried out exclusively with information freely available to the

public, and the database used is by no means one-of-a-kind. Staggering amounts of

information that can shed light to a myriad of questions raised by the social sciences are

at the disposal of anyone interested and with a measure of technical knowhow. The

information that was finally used in the project is dwarfed by the actual contents of the

database, which already is an incomplete copy of the information available in the TPB

website. The increasing availability of large public domain datasets that, after cleansing

and preparation, supply almost universal observations opens a very stimulating field for

new theory development, as well as for replication of previous research that was based

on attitudinal data or smaller samples.

51

11. Appendix 1 – Code used for database cleanup

NOTE: the character “_” indicates that the line was truncated for it to fit this document

Karel Bilek’s original XML file had the following structure:

<!ELEMENT archive (Torrent*)>

<!ELEMENT Torrent (id, title, magnet, size, seeders,_

leechers, quality?, uploaded, nfo, comments)>

<!ELEMENT quality (up,down)>

<!ELEMENT id (#PCDATA)>

<!ELEMENT magnet (#PCDATA)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT size (#PCDATA)>

<!ELEMENT seeders (#PCDATA)>

<!ELEMENT leechers (#PCDATA)>

<!ELEMENT uploaded (#PCDATA)>

<!ELEMENT nfo (#PCDATA)>

<!ELEMENT when (#PCDATA)>

<!ELEMENT what (#PCDATA)>

<!ELEMENT up (#PCDATA)>

<!ELEMENT down (#PCDATA)>

From the Raw Dataset to Global Dataset

First step was removing all the fields that were not used in the project. For example the

following command gets rid of all the information in the “magnet” field from the file

“raw.dataset.xml”, and saves the output into a new XML document

(“global.dataset.xml”).

sed '/^<magnet>/d' raw.dataset.xml > global.dataset.xml

Several of these commands can be concatenated in order to speed up the process.

In this example we get rid of the “magnet”, “quality” and “leechers” fields.

sed '/^<magnet>/d' raw.dataset.xml| sed '/^<quality>/d'_ |

sed '/^<leechers>/d' > global.dataset.xml

52

In order to count the comments and then remove them we used something similar

to the following command, that counts the comments in each Torrent file and for each

one of them adds the “<countcomm>” field, which contains the actual number.

awk '/^<comment>/ {count++} /^<\/comments>/ {print; print_

"<countcomm>" count "<\/countcomm>"; count=0; next}1'_

raw.dataset.xml > global.dataset.xml

Then the actual comments were deleted following a similar method to the one

used in irrelevant or redundant fields.

From the Global Dataset to the Media Dataset

The only step to go from the Global to the Media Dataset is removing all those files that

do not belong to one of the big four formats. To do so first we created a different XML

file with all the entries that contained patterns that matched the formats we were

studying. For example for MP4 the command was

awk '/^<Torrent>/ {pre_print=1; do_print=0}_ pre_print==1_

{y=x "\n" $0; x=y}; do_print==1 {print}_ /[Mm][Pp]4/_

{do_print=1; pre_print=0; print y; y=""}_ /^<\/Torrent>/_

{do_print=0; y=""; x=""}' global.dataset.xml > mp4.data

In the case of AVI we had to be extra careful since many files contained the

pattern “AVI” without them actually being wrapped in it (for example, the previous

code for AVI would return an entry like “The Aviator.mp4”). For this reason, in that

case the pattern had to be either preceded by a dot (“.”) or an isolated word (surrounded

by blank spaces).

Once we had the XML files for each separate format a new field <format> was

added to them containing the format (obviously), and then all the files were

concatenated into “media.dataset.xml” with the following command

cat mp4.data avi.data mkv.data wmv.data > media.dataset.xml

At this point the “<nfo>” field, containing miscellaneous information on the

Torrent was deleted.

53

From the Media Dataset to the TV Shows Dataset

The following code gets all the files for the television show “The Walking Dead” and

puts them into the file “TheWalkingDead.data”. It also adds a “<show>” field

(containing in this case “TheWalkingDead”) which was used as the group variable in

the regressions.

awk '/^<Torrent>/ {pre_print=1; do_print=0} pre_print==1_

{y=x "\n" $0; x=y}; do_print==1 {print}_

/[Tt]he.[Ww]aling.[Dd]ead/ {do_print=1; pre_print=0; print_

y; y=""} /[Tt]he[Ww]alking[Dd]ead/ {do_print=1;_

pre_print=0; print y; y=""} /^<\/Torrent>/ {do_print=0;_

y=""; x=""}' media.dataset.xml | awk '/^<countcomm>/_

{print;print "<show>TheWalkingDead<\/show>";next}1' > _

TheWalkingDead.data

Finally all the television show files generated were concatenated into

“shows.dataset.xml” following a similar procedure as the one described before.

54

12. Appendix 2 – Descriptive statistics for television shows

Show Freq.

Average MP4

user share

Average AVI

user share

Average MKV

user share

Average WMV

user share

American Dad 37 0.1477 0.8207 0.0316 0.0000

American Idol 21 0.0660 0.8041 0.0823 0.0476

Archer 20 0.5305 0.1750 0.2945 0.0000

Battlestar Galactica 34 0.1082 0.7162 0.1609 0.0147

Big Bang Theory 44 0.1997 0.4933 0.2843 0.0227

Boston Legal 8 0.1250 0.7500 0.1250 0.0000

Breaking Bad 24 0.1598 0.6691 0.1711 0.0000

CSI 48 0.0487 0.8372 0.1141 0.0000

Chuck 37 0.3522 0.5766 0.0441 0.0270

Cold Case 13 0.0000 0.9650 0.0350 0.0000

Community 25 0.0911 0.7455 0.1434 0.0200

Criminal Minds 38 0.1913 0.6066 0.2020 0.0000

Deadwood 7 0.0000 0.8571 0.1429 0.0000

Desperate Housewives 45 0.0692 0.8522 0.0787 0.0000

Dexter 33 0.1288 0.7135 0.1577 0.0000

Family Guy 59 0.3015 0.5823 0.0989 0.0173

Fringe 38 0.1722 0.5753 0.2525 0.0000

Game Of Thrones 10 0.4153 0.2318 0.3479 0.0050

Gilmore Girls 7 0.0000 1.0000 0.0000 0.0000

Gossip Girl 32 0.2937 0.6166 0.0897 0.0000

How I Met Your Mother 46 0.2493 0.4275 0.3233 0.0000

Jersey Shore 7 0.4603 0.5397 0.0000 0.0000

King Of The Hill 4 0.0000 0.8750 0.1250 0.0000

Last Airbender 13 0.3382 0.3867 0.1185 0.1566

Mad Men 21 0.1021 0.7166 0.1814 0.0000

Masterchef 7 0.4286 0.5714 0.0000 0.0000

Modern Family 27 0.0996 0.7371 0.1633 0.0000

My Name Is Earl 17 0.0000 1.0000 0.0000 0.0000

NCIS 51 0.2410 0.6633 0.0761 0.0196

One Tree Hill 28 0.0134 0.9866 0.0000 0.0000

Prison Break 36 0.0762 0.8232 0.1006 0.0000

Rescue Me 13 0.0833 0.8776 0.0362 0.0028

Robot Chicken 9 0.1453 0.6880 0.1667 0.0000

Scrubs 24 0.2575 0.6650 0.0358 0.0417

Simpsons 66 0.2445 0.6376 0.1179 0.0000

Smallville 46 0.1841 0.7281 0.0878 0.0000

Sons Of Anarchy 4 0.4250 0.0000 0.5750 0.0000

South Park 52 0.2076 0.6715 0.1123 0.0085

Star Wars Clone Wars 33 0.2117 0.4986 0.2844 0.0053

Stargate 59 0.1883 0.6575 0.1494 0.0048

The Closer 10 0.1857 0.8143 0.0000 0.0000

The Mole 4 0.5000 0.5000 0.0000 0.0000

The Office 47 0.2581 0.6005 0.0921 0.0493

The Sopranos 7 0.4286 0.4286 0.1429 0.0000

The Walking Dead 14 0.2229 0.3337 0.4434 0.0000

The West Wing 4 0.0000 1.0000 0.0000 0.0000

Ugly Betty 13 0.0000 1.0000 0.0000 0.0000

Vampire Diaries 26 0.0618 0.7400 0.1597 0.0385

Young Justice 13 0.0587 0.5152 0.4261 0.0000

iCarly 14 0.2143 0.4286 0.3571 0.0000

55

Show Freq.

Average

MP4 upload

share

Average AVI

upload share

Average

MKV upload

share

Average

WMV

upload share

American Dad 37 0.12 0.78 0.10 0.00

American Idol 21 0.05 0.75 0.08 0.11

Archer 20 0.15 0.57 0.28 0.00

Battlestar Galactica 34 0.08 0.72 0.18 0.01

Big Bang Theory 44 0.16 0.51 0.30 0.02

Boston Legal 8 0.08 0.83 0.08 0.00

Breaking Bad 24 0.18 0.64 0.18 0.00

CSI 48 0.06 0.72 0.20 0.03

Chuck 37 0.10 0.74 0.16 0.00

Cold Case 13 0.09 0.79 0.12 0.00

Community 25 0.04 0.72 0.20 0.04

Criminal Minds 38 0.32 0.57 0.12 0.00

Deadwood 7 0.00 0.78 0.22 0.00

Desperate Housewives 45 0.05 0.83 0.12 0.00

Dexter 33 0.12 0.73 0.14 0.01

Family Guy 59 0.28 0.60 0.10 0.02

Fringe 38 0.12 0.58 0.30 0.00

Game Of Thrones 10 0.19 0.36 0.44 0.00

Gilmore Girls 7 0.00 1.00 0.00 0.00

Gossip Girl 32 0.17 0.66 0.17 0.00

How I Met Your Mother 46 0.20 0.49 0.32 0.00

Jersey Shore 7 0.08 0.77 0.15 0.00

King Of The Hill 4 0.24 0.00 0.76 0.00

Last Airbender 13 0.20 0.58 0.22 0.00

Mad Men 21 0.23 0.55 0.14 0.08

Masterchef 7 0.15 0.73 0.12 0.00

Modern Family 27 0.18 0.75 0.07 0.00

My Name Is Earl 17 0.49 0.13 0.38 0.00

NCIS 51 0.08 0.80 0.07 0.04

One Tree Hill 28 0.17 0.74 0.09 0.01

Prison Break 36 0.06 0.92 0.02 0.00

Rescue Me 13 0.10 0.78 0.11 0.01

Robot Chicken 9 0.09 0.68 0.19 0.04

Scrubs 24 0.13 0.78 0.09 0.00

Simpsons 66 0.25 0.66 0.06 0.03

Smallville 46 0.20 0.60 0.19 0.00

Sons Of Anarchy 4 0.14 0.57 0.29 0.00

South Park 52 0.22 0.64 0.13 0.01

Star Wars Clone Wars 33 0.21 0.64 0.14 0.01

Stargate 59 0.24 0.55 0.20 0.01

The Closer 10 0.12 0.61 0.26 0.00

The Mole 4 0.35 0.34 0.31 0.00

The Office 47 0.16 0.74 0.06 0.04

The Sopranos 7 0.34 0.31 0.27 0.08

The Walking Dead 14 0.48 0.24 0.26 0.02

The West Wing 4 0.26 0.51 0.23 0.00

Ugly Betty 13 0.15 0.56 0.29 0.00

Vampire Diaries 26 0.07 0.85 0.04 0.03

Young Justice 13 0.08 0.56 0.36 0.00

iCarly 14 0.14 0.37 0.49 0.00

56

14. Appendix 3 – Stata output of the regressions

Stata output for the Hausman test for the User Share specification of MP4

(V_b-V_B is not positive definite)

Prob>chi2 = 0.9293

= 4.37

chi2(10) = (b-B)'[(V_b-V_B)^(-1)](b-B)

Test: Ho: difference in coefficients not systematic

B = inconsistent under Ha, efficient under Ho; obtained from xtreg

b = consistent under Ho and Ha; obtained from xtreg

d2012 .2007474 .1654744 .0352731 .

d2011 .0207218 -.0311989 .0519207 .

d2010 .1371569 .0788659 .058291 .

d2009 .0372154 -.0198837 .0570991 .

d2008 .148703 .0865165 .0621865 .

d2007 .1034385 .0669219 .0365166 .

d2006 .0668862 .023194 .0436922 .

d2005 .0944475 .06784 .0266075 .

tvibusmp4j .012848 .3178492 -.3050012 .0571412

mp4ibuplag .4599806 .4024638 .0575168 .0035099

fixed random Difference S.E.

(b) (B) (b-B) sqrt(diag(V_b-V_B))

Coefficients

. hausman fixed random

57

Stata output for the random effects regression of MP4 user share using the 2004-2012

panel using year dummies

pseudo R2=.19967444

. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)

Likelihood-ratio test of sigma_u=0: chibar2(01)= 9.09 Prob>=chibar2 = 0.001

rho .0435735 .0231201 .0138579 .1112324

/sigma_e .3011312 .0062099 .2892026 .3135517

/sigma_u .0642748 .0175428 .037646 .1097396

_cons -.0221572 .2164335 -0.10 0.918 -.4463591 .4020448

d2012 .1736053 .2351182 0.74 0.460 -.2872179 .6344285

d2011 -.0206898 .2242757 -0.09 0.926 -.460262 .4188825

d2010 .0897151 .2240411 0.40 0.689 -.3493974 .5288276

d2009 -.0099462 .2232052 -0.04 0.964 -.4474203 .427528

d2008 .0965474 .2197812 0.44 0.660 -.3342158 .5273107

d2007 .0726784 .2195981 0.33 0.741 -.3577261 .5030828

d2006 .0305307 .2194145 0.14 0.889 -.3995139 .4605753

d2005 .0732896 .2217404 0.33 0.741 -.3613137 .5078928

tvibusmp4j .2704914 .0892177 3.03 0.002 .0956278 .4453549

mp4ibuplag .4122865 .1983688 2.08 0.038 .0234907 .8010822

mp4usershare Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -291.93558 Prob > chi2 = 0.0000

LR chi2(10) = 145.67

max = 65

avg = 25.1

Random effects u_i ~ Gaussian Obs per group: min = 3

Group variable: shows Number of groups = 50

Random-effects ML regression Number of obs = 1255

Iteration 3: log likelihood = -291.93558




Fitting full model:






Fitting constant-only model:

> d2011 d2012, mle

. xtreg mp4usershare mp4ibuplag tvibusmp4j d2005 d2006 d2007 d2008 d2009 d2010

58

Stata output for the Hausman test for the Uploader Share specification of MP4

Prob>chi2 = 0.5821

= 8.48

chi2(10) = (b-B)'[(V_b-V_B)^(-1)](b-B)

Test: Ho: difference in coefficients not systematic

B = inconsistent under Ha, efficient under Ho; obtained from xtreg

b = consistent under Ho and Ha; obtained from xtreg

d2012 -.2830568 -.2971939 .014137 .0178864

d2011 -.2282999 -.2460666 .0177668 .0176648

d2010 -.1935188 -.2097648 .016246 .0172677

d2009 -.1713512 -.1845737 .0132225 .0171765

d2008 -.2231884 -.2336473 .0104589 .0171741

d2007 -.1483019 -.1632687 .0149667 .0161921

d2006 -.1073136 -.1159659 .0086523 .0167336

d2005 -.1796632 -.1801901 .0005269 .0172514

tvibusmp4j -.0165451 .0086567 -.0252018 .033146

mp4ibuslag .9148485 .9079244 .0069241 .0208008

fixed random Difference S.E.

(b) (B) (b-B) sqrt(diag(V_b-V_B))

Coefficients

. hausman fixed random

59

Stata output for the random effects regression of MP4 uploader share using the 2004-

2012 panel and year dummies

pseudo R2=.12071189



rho .0820309 .0240443 .0443224 .139962

/sigma_e .2677043 .0049919 .2580969 .2776693

/sigma_u .0800258 .0125805 .0588053 .1089039

_cons -.0960992 .0593363 -1.62 0.105 -.2123962 .0201978

d2011 .0508353 .0320134 1.59 0.112 -.0119098 .1135804

d2010 .0873114 .0304604 2.87 0.004 .0276101 .1470127

d2009 .11274 .0358199 3.15 0.002 .0425343 .1829456

d2008 .0638723 .0396789 1.61 0.107 -.0138969 .1416415

d2007 .134018 .0468656 2.86 0.004 .0421632 .2258729

d2006 .1818232 .0539996 3.37 0.001 .0759859 .2876605

d2005 .118147 .0636625 1.86 0.063 -.0066291 .2429231

d2004 .2988555 .2001379 1.49 0.135 -.0934076 .6911186

tvibusmp4j .0113603 .0641271 0.18 0.859 -.1143265 .1370472

mp4ibuslag .9075766 .1694564 5.36 0.000 .5754483 1.239705

mp4uploadsh~e Coef. Std. Err. z P>|z| [95% Conf. Interval]


LR chi2(10) = 49.52

max = 70

avg = 29.8








Fitting full model:





> d2010 d2011, mle

. xtreg mp4uploadshare mp4ibuslag tvibusmp4j d2004 d2005 d2006 d2007 d2008 d2009

60

Stata output for the random effects regression of AVI user share using the 2004-2012

panel and year dummies

pseudo R2=.20254558



rho .0564343 .0251837 .0216804 .1249395

/sigma_e .3444233 .0070843 .3308144 .358592

/sigma_u .0842322 .0195723 .0534185 .1328203

_cons .5532267 .2802638 1.97 0.048 .0039198 1.102534

d2012 -.4311116 .2635305 -1.64 0.102 -.9476218 .0853986

d2011 -.141057 .2595698 -0.54 0.587 -.6498045 .3676905

d2010 -.1883493 .2586176 -0.73 0.466 -.6952305 .3185319

d2009 -.1198652 .2538927 -0.47 0.637 -.6174857 .3777553

d2008 -.1735844 .2502113 -0.69 0.488 -.6639895 .3168208

d2007 -.168287 .2507594 -0.67 0.502 -.6597665 .3231924

d2006 -.0368414 .2506619 -0.15 0.883 -.5281298 .4544469

d2005 -.0836263 .2535782 -0.33 0.742 -.5806304 .4133777

tvibusavij .2947808 .0709902 4.15 0.000 .1556426 .4339189

aviibuplag .2105681 .1496401 1.41 0.159 -.082721 .5038573

aviusershare Coef. Std. Err. z P>|z| [95% Conf. Interval]


LR chi2(10) = 235.65

max = 65

avg = 25.1








Fitting full model:








> d2011 d2012, mle

. xtreg aviusershare aviibuplag tvibusavij d2005 d2006 d2007 d2008 d2009 d2010

61

Stata output for the random effects regression of AVI uploader share using the 2004-

2012 panel using year dummies

pseudo R2=.05974167



rho .1563382 .0378037 .0936167 .2418272

/sigma_e .3399712 .0063566 .327738 .352661

/sigma_u .1463492 .0205957 .1110711 .1928323

_cons .141896 .2791412 0.51 0.611 -.4052107 .6890026

d2012 .0701628 .2580629 0.27 0.786 -.4356312 .5759568

d2011 .1185005 .2542021 0.47 0.641 -.3797265 .6167274

d2010 .0872295 .2528111 0.35 0.730 -.4082711 .58273

d2009 .0444943 .248639 0.18 0.858 -.4428293 .5318179

d2008 -.0289777 .2467597 -0.12 0.907 -.5126179 .4546625

d2007 -.0650242 .2473783 -0.26 0.793 -.5498766 .4198283

d2006 -.0102054 .2469804 -0.04 0.967 -.4942781 .4738673

d2005 .0034731 .2500632 0.01 0.989 -.4866419 .493588

tvibusavij .2637252 .0593309 4.44 0.000 .1474388 .3800116

aviibuslag .5039363 .1608559 3.13 0.002 .1886645 .8192081

aviuploadsh~e Coef. Std. Err. z P>|z| [95% Conf. Interval]


LR chi2(10) = 69.87

max = 70

avg = 29.8








Fitting full model:






> d2011 d2012 ,mle

. xtreg aviuploadshare aviibuslag tvibusavij d2005 d2006 d2007 d2008 d2009 d2010

62

15. Appendix 4: MKV regressions and comments

Stata output for the random effects regression of MKV user share using the 2004-2012

panel using year dummies

.

pseudo R2=.57712741



rho .0301588 .0149637 .0105102 .0736249

/sigma_e .2477991 .0050423 .2381108 .2578817

/sigma_u .0436975 .0110393 .0266328 .0716961

_cons .0033926 .1778643 0.02 0.985 -.3452151 .3520002

d2012 .2084746 .1808754 1.15 0.249 -.1460347 .5629839

d2011 .1747784 .1831672 0.95 0.340 -.1842227 .5337794

d2010 .1080792 .1828104 0.59 0.554 -.2502226 .466381

d2009 .0990813 .1788387 0.55 0.580 -.2514362 .4495987

d2008 .0604957 .1786583 0.34 0.735 -.2896682 .4106595

d2007 .029761 .1795184 0.17 0.868 -.3220885 .3816106

d2006 -.0080571 .179619 -0.04 0.964 -.3601039 .3439897

d2005 -.0029317 .181876 -0.02 0.987 -.3594021 .3535386

tvibusmkvj .3232374 .0771018 4.19 0.000 .1721206 .4743542

mkvibuplag -.0493801 .1395335 -0.35 0.723 -.3228608 .2241006

mkvusershare Coef. Std. Err. z P>|z| [95% Conf. Interval]


LR chi2(10) = 117.95

max = 65

avg = 25.1








Fitting full model:







> d2011 d2012, mle

. xtreg mkvusershare mkvibuplag tvibusmkvj d2005 d2006 d2007 d2008 d2009 d2010

63

Stata output for the random effects regression of MKV uploader share using the 2004-

2012 panel and year dummies

pseudo R2=.25597519



rho .1160257 .0309744 .0660476 .1882694

/sigma_e .2499455 .0046692 .2409596 .2592666

/sigma_u .090553 .013442 .0676936 .1211317

_cons -.0463475 .1808427 -0.26 0.798 -.4007926 .3080977

d2012 .1729023 .1862589 0.93 0.353 -.1921584 .537963

d2011 .1713201 .184535 0.93 0.353 -.190362 .5330021

d2010 .1333542 .1843655 0.72 0.469 -.2279954 .4947039

d2009 .1450172 .1811716 0.80 0.423 -.2100725 .500107

d2008 .2278536 .1806629 1.26 0.207 -.1262392 .5819463

d2007 .1740835 .181261 0.96 0.337 -.1811816 .5293486

d2006 .1234724 .1817065 0.68 0.497 -.2326657 .4796105

d2005 .1820709 .1838887 0.99 0.322 -.1783443 .5424862

tvibusmkvj .3341119 .0784412 4.26 0.000 .1803699 .4878538

mkvibuslag .2217118 .2228183 1.00 0.320 -.2150041 .6584276

mkvuploadsh~e Coef. Std. Err. z P>|z| [95% Conf. Interval]


LR chi2(10) = 58.26

max = 70

avg = 29.8








Fitting full model:






> d2011 d2012 , mle

. xtreg mkvuploadshare mkvibuslag tvibusmkvj d2005 d2006 d2007 d2008 d2009 d2010

64

Some notes on the case of MKV

MKV is interesting in that direct network effects seem to have dominated adoption for

both users and uploaders, with indirect ones lacking statistical significance in either case

(although for the uploaders’ specification it comes close). One of the most important

advantages it has over other formats is its almost universal compatibility with video and

audio codecs, present and, thanks to its architecture, future. This makes it an interesting

choice for uploaders since any technical improvements on the codec side could

potentially be tackled without needing to change formats. On the user side it is attractive

since it supports most of the features characteristic of retail media (i.e. DVD and Blu-

ray) such as 3D, menus, HD or captions.

A first explanation for what we observe is that MKV, unlike MP4, is strictly an

audiovisual wrapper, so it has a harder time getting an installed base because the range

of contents it can offer is more limited. This, together with the lack of compatibility

with devices different than PCs, would partially explain its reliance on direct network

effects for its diffusion. An additional factor that may explain the significance of the

direct network effect in the uploaders’ specification is that the format is sought after and

requested by users, something that could be checked in future research by looking at the

content of the comments left in TPB.

Finally, the pseudo determination coefficients for the MKV regressions were

significantly higher than in the cases of MP4 or AVI, which supports our theory of an

adoption less conditioned by external factors and determined mostly by network effects,

which was what our econometric model accounted for. It remains to be seen if in time

(as users start asking more for the features at which MKV has a comparative advantage

or as the format’s compatibility improves) its adoption dynamics will take a similar

form as the one we observed for MP4.

65

13. Appendix 5 – Theoretical model derivations

Derivation of the price indexes

Consumers that adopt the technology may purchase their goods from all the companies

operating in the market, but will do so at different prices depending on whether or not

the firm is also an adopter. Firms in the [n-qn] are not adopters and the price they set for

their goods is while firms in the [qn-0] interval have adopted and set their price at

. Therefore we can express the price index as

∫

∫ (

)

∫ (

)

∫

(

)

(

) ⁄

(

) ⁄

∫

[

] [1]

Non-adopting consumers are only able to purchase from non-adopting firms,

therefore their price index only accounts for companies in the [n-qn] interval, and only

purchase at price .

∫

∫ (

)

(

)

(

)

∫

[2]

Derivation of consumer utilities

Based on the demand each consumer has for specific good, we defined the utility of a

consumer that has adopted the technology as

(∫

)

(∫ (

∫

)

)

66

Applying the same method as before we divide the companies into those that have

adopted and those that have not, and we obtain the following expression for the utility

(∫ (

∫

)

∫ (

∫

)

)

((

∫

)

(

∫

)

(

∫

)

)

{

(∫

) [

]}

(

)

(∫

)

Substituting the price index for expression [1] we obtain

(

)

[

]

(

)

[3]

Consumers that have not adopted purchase only from companies in the [n-qn] interval,

so their utility can be written as

(∫ (

∫

)

)

Since they can only purchase from non-adopting firms all the goods they buy have

the unique price

67

(∫ (

∫

)

)

( (

∫

)

(

∫

)

)

( (

∫

)

)

∫

By substituting the price index for equation [2] the expression becomes

⁄

[4]

Derivation of the consumers’ adoption threshold

The adoption threshold was defined by the difference between the adopters’ and non-

adopters’ utility functions

[5]

So by plugging the expressions in equations [3] and [4] into equation [5] the adoption

threshold becomes

(

)

⁄

Which, after rearranging becomes the expression shown on section 8

{[

]

}

68

14. Appendix 6 – Code used in the simulation

All text in green corresponds to annotations and has no effect on the execution of the

simulation. The underscore symbol “_” indicates the line was truncated for it to fit into

this document. Copying and pasting the code into Netlogo will not work since due to

the application’s restrictions some of the parameters are initialized in the simulation

interface. However, a fully functioning copy of the file is available at:

http://tinyurl.com/JNG-FP-SIM

;code for the microeconomic model simulation

breed [firms firm] ;initialize firms class

breed [consumers consumer] ;initialize consumers class

globals [q s kf kc aux stop?] ;initialize global variables

firms-own [tech? fadoptionCost] ;initialize firm-specific variables

consumers-own [adopter? adoptionCost] ;initialize consumer-specific

variables

to setup ;prepare the simulation

__clear-all-and-reset-ticks

random-seed 6267753 ;set random seed to allow for replication of the

results

set q 0

set s 0

set aux 0

set stop? false

create-firms n ;create firms

[ set tech? false

;give firms an iid adoption cost

set fadoptionCost (fkmax / n) * aux

set aux aux + 1]

set aux 0

create-consumers cons

[ set adopter? False

;give users an iid adoption cost

set adoptionCost (ckmax / cons) * aux

set aux aux + 1

;random day zero adoption (a proportion ibc of consumers adopts)

;command random 10 returns a random value in the [0-9] interval

if (random 10) + 1 > (10 – ibc) [set adopter? true]]

end

to go ;main procedure, this is executed every cycle

updateGlobalVars ;update global variables

updateBreedVars ;update variables owned by firms or consumers

tick

;stop the simulation if the stopping condition is fulfilled

if stop? [stop]

end

http://tinyurl.com/JNG-FP-SIM

69

to updateGlobalVars

;set the stopping condition to true if adoption shares don't change

if count consumers with [adopter? = true] / cons = s and count_

firms with [tech? = true] / n = q [set stop? true]

;update adoption shares for firms and consumers

set s count consumers with [adopter? = true] / cons

set q count firms with [tech? = true] / n

;update threshold adoption costs for firms

set kf (cons / n) * (1 - alpha) * (s * (beta ^ (alpha / (alpha -_

1))) - 1) / (1 + q * ((beta ^ (alpha / (alpha - 1))) - 1))

;update threshold adoption costs for customers

set kc (alpha / cost) * (n ^ ((2 - alpha) / alpha)) * (((1 - q + q_

* (beta_ ^ (alpha / (alpha - 1)))) ^ ((1 - alpha) / alpha)) - (1 - q)_

^ ((2 - alpha) / alpha))

end

to updateBreedVars

;ask consumers that have not adopted yet to do so if their adoption

cost is smaller than the threshold for consumers

ask consumers with [not adopter?] [if adoptionCost <= kc [set

adopter?_ true]]

;ask firms that have not adopted yet to do so if their adoption cost

is smaller than the threshold for firms

ask firms with [not tech?] [if fadoptionCost <= kf [set tech? true]

]

end

70

15. References

ARMSTRONG, M., 2006, ‘Competition in Two-Sided Markets’, The RAND Journal of

Economics, 37(3): 668-691

ARTHUR, W. B., 1989, ‘Competing Technologies, Increasing Returns, and Lock-In by

Historical Events’, Economic Journal, 99: 116-131

ARTHUR, W. B., 1996, Increasing Returns and Path Dependence in the Economy, Ann

Arbor, MI: University of Michigan Press

CHURCH, J. and N. GANDAL, 1992, ‘Network Effects, Software Provision and

Standardization’, Journal of Industrial Economics, 40(1): 85-103

CHURCH, J. and N. GANDAL, 1992, ‘Complimentary Network Externalities and

Technological Adoption’, Journal of Industrial Organization, 11: 239-260

CLEMENTS M.T and H. OHASHI, 2004, ‘Indirect Network Effects and the Product

Cycle: Video Games in the U.S., 1994-2002’, Working Paper, University of Tokyo

CLEMENTS, M.T, 2003, ‘Inefficient Adoption of Technological Standards: Inertia and

Momentum Revisited’, Economic Inquiry, 43(3): 507-518

DAVIS, F. D., 1989, ‘Perceived usefulness, perceived ease of use, and user acceptance

of information technology’, MIS Quarterly, 13(3): 319–340

DIXIT, A. and STIGLITZ J., 1977, ‘Monopolistic Competition and Optimal Product

Diversity’, American Economic Review, 67: 297-308

FARRELL, J. and P. KLEMPERER, 2003, ‘Coordination and Lock-in: Competition

with Switching Costs and Network Effects’, Handbook of Industrial Organization 3,

Amsterdam: North Holland

FARRELL, J. and G. SALONER., 1985, ‘Standardization, Compatibility and

Innovation’, Rand Journal of Economics, 16:. 70-83

FARRELL, J. and G. SALONER, 1986, ‘Standardization and Variety’, Economic

Letters, 20: 71-74

71

FARRELL, J. and G. SALONER, 1986, ‘Installed base and compatibility: Innovation,

pre-announcement and predation’, American economic Review, 76(5): 940-955

GÖTZ, G., 1999, ‘Monopolistic competition and the diffusion of new technology’, The

RAND Journal of Economics, 30(4): 679-693

KATZ, M. L. and C. SHAPIRO., 1985, ‘Network Externalities, Competition and

Compatibility’, The American Economic Review, 75(3): 424-440

KATZ, M. L. and C. SHAPIRO., 1986, ‘Technology adoption in the presence of

network externalities’, Journal of Political Economy, August: 822-841

KATZ, M. L. and C. SHAPIRO., 1994, ‘Systems Competition and Network Effects’,

The Journal of Economic Perspectives, 8: 93-115

OHASHI, H., 2003, ‘The Role of Network Effects in the U.S. VCR Market, 1978-86’,

Journal of Economics and Management Strategy, 12(4): 447-494

ROCHET, J. and J. TIROLE, 2003, ‘Platform Competition in Two-Sided Markets’,

Journal of the European Economics Association, 1: 990-1029

ROCHET, J. and J. TIROLE, 2006, ‘Two-Sided Markets: a Progress Report’, The

RAND Journal of Economics, 37(3): 645-667

RYSMAN, M., 2002, ‘Competition between Networks: A Study of the Market for

Yellow Pages’, Boston University

VARIAN, H. R., FARRELL, J. and C. SHAPIRO, 1999, ‘The Economics of

Information Technology’, Cambridge: Cambridge University Press

VARIAN, H. R. and C. SHAPIRO, 1999, ‘The Art of Standards Wars’, California

Management Review, 41(2): 8-32

Documents

Non-Sponsored Technology Adoption: How Network Effects ... premi/tfc 41 23 Nueno.pdf2. Overview of BitTorrent and The Pirate Bay BitTorrent is a protocol that facilitates peer-to-peer