Word Doc Download

Andrew Brampton

- 1 of 61 -

Andrew BramptonPeer-to-Peer Media Streaming

B.Sc. Computer ScienceMarch 2004

Peer-to-Peer Media Streaming

Andrew Brampton

I certify that the material contained in this dissertation is my own work and does not contain significant portions of unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation.

Signed

Andrew Brampton

Date 19th March 2004

- 2 of 61 -


Andrew Brampton

AbstractPeer To Peer networks are quickly becoming a new foundation for future

internet applications; however no one has applied a P2P paradigm to that of streaming continuous media. One of the key aspects of the future internet will be multimedia rich environment where video and audio streaming is common place between many different people. These services however have not appeared due to many technical problems. This report researches and designs a new concept for adapting existing P2P techniques and applying them to a streaming context to provide a faster and more reliable transport medium for streaming media. If this system works as expected anyone regardless of bandwidth could stream video to thousands of hosts without loss of performance, all by using the receiving peer’s bandwidth to help transmit the stream.

Working document URL: http://www.lancs.ac.uk/ug/brampton/fyp/Contact Email: [email protected]

- 3 of 61 -


http://www.lancs.ac.uk/ug/brampton/fyp/

Andrew Brampton

Table of ContentsAbstract.........................................................................................................................3

Table of Contents.........................................................................................................4List of Figures............................................................................................................7List of Tables..............................................................................................................7

1 Introduction..........................................................................................................81.1 Overview of Streaming........................................................................................81.2 Overview of Peer-to-Peer.....................................................................................81.3 Project Goals........................................................................................................91.4 Why is this system needed?.................................................................................91.5 Report Structure...................................................................................................9

2 Background Reading.........................................................................................102.1 History of Peer-to-Peer......................................................................................10

2.1.1 ARPANET and the early Internet...............................................................102.1.2 Domain Name System (DNS).....................................................................102.1.3 Usenet..........................................................................................................11

2.2 Recent P2P.........................................................................................................112.2.1 Napster........................................................................................................112.2.2 Gnutella.......................................................................................................122.2.3 Fasttrack......................................................................................................132.2.4 Gnutella2.....................................................................................................142.2.5 FreeNet........................................................................................................142.2.6 Distributed.net.............................................................................................152.2.7 SkyPe...........................................................................................................152.2.8 Bittorrent.....................................................................................................15

2.3 Streaming Technologies.....................................................................................162.3.1 Multicast......................................................................................................162.3.2 Batch Chaining............................................................................................172.3.3 NICE...........................................................................................................172.3.4 ZIGZAG......................................................................................................18

2.4 Recent Research.................................................................................................182.4.1 Pastry...........................................................................................................182.4.2 SplitStream..................................................................................................192.4.3 Chord...........................................................................................................20

2.5 Summary............................................................................................................20

3 Design..................................................................................................................213.1 Requirements......................................................................................................21

3.1.1 Provide a robust network............................................................................213.1.2 Allow quick re-join after peer failure..........................................................213.1.3 Stream data with low control overhead.......................................................213.1.4 Move the stream distribution load away from the source...........................213.1.5 Be scalable..................................................................................................223.1.6 Media agnostic............................................................................................223.1.7 Be secure.....................................................................................................22

3.2 Peer-to-Peer Network.........................................................................................223.3 Stream Representation.......................................................................................233.4 Tracker...............................................................................................................24

- 4 of 61 -


Andrew Brampton

3.5 Tracker-less Network.........................................................................................243.6 Peer.....................................................................................................................253.7 Source Peer.........................................................................................................253.8 Peer and Tracker Overview................................................................................263.9 Tracker Protocol.................................................................................................27

3.9.1 &peer-id=....................................................................................................273.9.2 &peer-ip=....................................................................................................273.9.3 &peer-port=.................................................................................................273.9.4 /?action=join................................................................................................273.9.5 /?action=part................................................................................................283.9.6 /?action=list.................................................................................................283.9.7 HTTP Headers.............................................................................................283.9.8 X-BitStream-PartSize..................................................................................283.9.9 X-BitStream-ContentType..........................................................................283.9.10 X-BitStream-Title.....................................................................................28

3.10 Peer Protocol....................................................................................................283.10.1 Packets.......................................................................................................283.10.2 Packet Header............................................................................................293.10.3 Keep Alive................................................................................................293.10.4 Handshake.................................................................................................293.10.5 Announcement..........................................................................................293.10.6 Request......................................................................................................303.10.7 Data...........................................................................................................30

3.11 Program Design................................................................................................303.11.1 PeerClient..................................................................................................313.11.2 StreamBufferInterface...............................................................................313.11.3 PlaybackInterface......................................................................................323.11.4 PeerConnection.........................................................................................323.11.5 PeerManager.............................................................................................323.11.6 PeerPackets...............................................................................................33

3.12 Algorithms........................................................................................................333.12.1 Piece Picking Quality of Service...............................................................333.12.2 Source Saturation Problem........................................................................343.12.3 Pre-emptive Sending.................................................................................34

3.13 Code Testing Strategies....................................................................................343.14 System Evaluation Strategies...........................................................................35

3.14.1 Predicted Results.......................................................................................353.15 Summary..........................................................................................................36

4 Implementation..................................................................................................374.1 Changes..............................................................................................................37

4.1.1 Tracker........................................................................................................374.1.2 PeerManager...............................................................................................384.1.3 Vorbis Ogg Playback Library.....................................................................394.1.4 Bitmap Class...............................................................................................40

4.2 Problems Encountered.......................................................................................404.2.1 StreamBuffer changing without notification...............................................414.2.2 Concurrency Issues.....................................................................................414.2.3 Self Connecting Peer & Peers Connecting Both Ways...............................41

4.3 Algorithms Used................................................................................................414.3.1 FindNextPiece.............................................................................................42

- 5 of 61 -


Andrew Brampton

4.3.2 PeerConnection Thread...............................................................................424.4 Summary............................................................................................................43

5 System in Operation...........................................................................................445.1 Tracker...............................................................................................................445.2 PeerSource..........................................................................................................455.3 PeerClient...........................................................................................................465.4 Summary............................................................................................................47

6 Testing.................................................................................................................486.1 Unit Testing........................................................................................................48

6.1.1 Bitmap Class...............................................................................................486.1.2 StreamBuffer Class.....................................................................................49

6.2 Integration Testing.............................................................................................496.2.1 PeerConnection Class..................................................................................49

6.3 Performance Testing..........................................................................................506.4 Summary............................................................................................................51

7 Evaluation...........................................................................................................527.1 Efficiency...........................................................................................................527.2 Overheads...........................................................................................................547.3 Summary............................................................................................................55

8 Conclusion...........................................................................................................568.1 Project Goals......................................................................................................568.2 Future Work.......................................................................................................578.3 Summary............................................................................................................57

9 References...........................................................................................................58

10 Appendix.........................................................................................................6010.1 Bitmap Test Cases............................................................................................6010.2 StreamBuffer Test Cases..................................................................................6010.3 PeerConnection Test Cases..............................................................................6110.4 Project Proposal...............................................................................................62

- 6 of 61 -


Andrew Brampton

List of FiguresFigure 2.1 A simple Napster network....12Figure 2.2 A search on a small Gnutella network............12Figure 2.3 A simple query via super-nodes on Fasttrack........13Figure 2.4 A BitTorrent network.......15Figure 2.5 Batch Chaining Technique.................................................17Figure 2.6 A NICE tree network..........17Figure 3.1 Diagram of Peers, Source Peer and Tracker......................................23Figure 3.2 Representation of a stream.............................................23Figure 3.3 A tracker-less network.........24Figure 3.4 UML Sequence diagram of Peer and Tracker interactions.....................................26Figure 3.5 Packet Diagram.........28Figure 3.6 UML Diagram of different classes within the system......................31Figure 3.7 UML of different PeerPackets...............................33Figure 4.1 UML Diagram of tracker design..........37Figure 4.2 UML Sequence diagram on how the tracker works internally............38Figure 4.3 UML Sequence diagram of PeerManager connecting to a peer.......38Figure 4.4 UML Class Diagram of OggPlayback. . .39Figure 4.5 UML Class Diagram of bitmap.........................40Figure 4.6 Flowchart of FindNextPiece......................................42Figure 5.1 Log generated by a tracker..............................44Figure 5.2 Log generated by a PeerSource................................46Figure 5.3 Log generated by a PeerClient..............46Figure 6.1 UML Class Diagram of a PeerConnection..49Figure 7.1 Graph of Percentage of the stream forwarded by non-source peers.....53Figure 7.2 Graph of protocol overheads depending on number of connected peers..............................55

List of TablesTable 3.1 Sequence of events for acquiring a new piece....36Table 6.1 List of tests carried out on the system...................50Table 6.2 Summarised results from 11 test cases...........50Table 7.1 Predicted overheads compared to observed overheads....................................................................................54

- 7 of 61 -


Andrew Brampton

1 IntroductionThe aim of this project is to research, create and develop a new method of

sending media data in a P2P (peer-to-peer) fashion by applying existing P2P techniques and adapting them to a streaming context. Currently P2P is an extremely popular area, but little research has been carried out into distributing media that changes over time. The majority of P2P usage is for static data, for example images, documents, or pre-recorded videos. These types of media don’t change and thus are more easily sent around a P2P network. This project will investigate current P2P and streaming media research and go on to design and implement a multi source streaming technology.

1.1 Overview of StreamingMedia streaming is the concept of sending continuous media over a network in

real time which could have been from a data store or created on the fly. A simple analogy to this is that of radio stations which broadcast audio over the air waves. Each moment of audio is broadcasted through the air for fractions of a second. After that time that moment of audio is irretrievable. This is also true with network streaming and even more critical when the media is compressed or encoded in a way that won’t tolerate loss of any kind.

Radio and Television broadcasts have been running for many years, however streaming technologies are comparatively new. Factors such as low bandwidth hosts, and high costs have limited streaming over the internet. Technical factors also play a large role in the limited success of streaming. Conventional radio waves are broadcasted from a source and sent out in all directions. However the internet is made up of many single point to point links which makes this concept of broadcast near impossible.

To add broadcast functionality to the internet, changes to the physical structure of the internet throughout the world would have to be made such as adopting Multicast; alternatively virtual overlay networks can be constructed. An overlay network is one that logically provides and acts like a normal local network (i.e. allowing connections between hosts, and services such as multicast/broadcast), but the difference being that the network may exist on top of many different physical networks. Implementing this presents many technical problems such as scalability and reliability thus becoming an increasingly difficult task when designed for a very diverse network such as the internet.

1.2 Overview of Peer-to-PeerPeer-to-Peer (P2P) is the technology that allows many networked hosts to

connect together on an equal basis to share a given resource. This resource may be a file, processing power, hardware resource such as a printer, but in this case will be a media stream. In recent years P2P has been used to help distribute content around a network but until very recently it has only been used for trivial tasks. Inside a P2P network a virtual network is created which will allow broadcast style messages to be sent. This medium will hopefully be reliable, timely and scaleable for streaming media to be transmitted to many thousands of hosts.

1.3 Project GoalsThis project aims to investigate current P2P and streaming research topics and

highlight any flaws in these systems. It will also integrate previously unrelated topics

- 8 of 61 -


Andrew Brampton

of P2P and streaming into a single solution. This solution will be developed by improving existing techniques whilst solving any flaws they may have. The developed solution must satisfy a list of requirements which will be derived and discussed in chapter 3.

Once a suitable solution has been found, it will be scrutinized under numerous tests to find out its usefulness and tested to demonstrate how much more efficient or effective it is to current streaming solutions.

1.4 Why is this system needed?The need for such a system is important when you look forward to the future

of the internet. More and more people are looking to use large scale video conferencing, and companies such as the BBC are looking to offer their entire video archive online1. Both these scenarios are not possible until technology has improved. Once such a technology has been developed, many more unknown uses will be devised by the general public. One such possible use would be enabling anyone on the internet to set up their own radio/TV station with very low bandwidth, and feasibly stream to many 1000s of hosts simultaneously.

Regardless of the use of such a system, it is obvious that future research can be built on top of this solution which could, in theory, provide large scale distribution of any kinds of future media.

1.5 Report StructureThe report will be split into seven chapters. The report begins after the

introduction with the background reading chapter. This will investigate and evaluate past and current research in the fields of P2P and streaming. It will explain how current implementations function and highlight their strengths and weaknesses.

By using this new knowledge, Chapter 3 will start by deriving a list of requirements and continue to design a new solution.

Implementation details will be the focus of Chapter 4 which will be written once the design has been implemented in code. This chapter discusses any changes or problems encountered during the implementation phase.

Chapter 5 will involve testing and which will be split into two distinctive sections. The first will prove the correctness of the implementation with black box testing and similar strategies. The second will discuss and display results from the testing conducted on the system to prove its effectiveness.

Evaluate will be the focus of Chapter 6, here will be discussed the test results gained in the second half of the testing section. This chapter will also try and explain why any results were better or worse than those predicted.

The final Chapter will be the overall project conclusion discussing how well the project completed its goals and any future research which can continue from this.

1 http://news.bbc.co.uk/1/hi/entertainment/tv_and_radio/3177479.stm

- 9 of 61 -


Andrew Brampton

2 Background ReadingPeer-to-Peer is the networking concept where each device on a network can

share its own resources on an equivalent basis with other devices acting as servers or clients. This network can be a physical one such as Ethernet, or a virtual overlay network such as Gnutella. The concept was originally designed as a way to distribute computing resources across many machines. Now the approach is used to help locate machines on the internet (DNS), or download files from other internet users (Kazaa).

This Chapter aims to discover how current streaming and Peer-to-Peer technologies work and learn about future developments in these fields. This chapter will then talk about the pros and cons of current implementations in preparation for a design to be developed that will build on their pros and fix their weaknesses.

2.1 History of Peer-to-PeerIn the past few years Peer-to-Peer (P2P) has been a new and actively research

topic, however, the concept of P2P is much older and was fundamental to the creation of the ARPANET and the Internet [1]. This concept is one in where each device on a network would be considered a peer and shares its own resources on an equivalent basis with other peers. Every peer has access to any other peer’s resources, and may access them at will. This is the opposite of the client/server model where all peers would use the resources of one dedicated more powerful server.

2.1.1 ARPANET and the early InternetIn 1969 the universities UCLA, UCSB and Utah with the Stanford Research

Institute formed the ARPANET [2]. This was the first network between different sites with the goal of sharing computing resources of each institution. There was no master/slave or client/server concept; each machine had equal power on the network.

Later client/server applications became more popular such as Telnet and FTP. However, the P2P analogy still existed. The computers running the telnet servers were also the computers that ran the telnet clients.

The P2P aspects slowly decreased as the ARPANET became larger and concepts such as security and resource management became more important. The original network was very open with any machine allowed access to any other machine. This caused problems with security and in the late 1980’s firewalls became common place dividing the Internet into many smaller private networks with only a few computing resources exposed at each site.

The P2P aspect between sites had mostly disappeared, however a few services still ran distributed but in a slightly more restrictive way. Instead of peers being able to connect to anything, trust networks were implemented where dedicated servers were allowed access to other servers that provided the same resource. Such examples of this were Usenet and DNS.

2.1.2 Domain Name System (DNS)The DNS system[3] is one that maps human readable addresses to machine

readable addresses, very similar to how a phone book works mapping the name John Smith to phone number 123-456. The system currently works by having 13 main DNS root servers [4] with many smaller DNS servers underneath them. These smaller servers are usually operated by Internet Service Providers (ISPs) which then provide the DNS service to all their users. When a request is made a user may ask their local

- 10 of 61 -


Andrew Brampton

DNS server, if the local DNS server doesn’t know the answer it will ask the DNS server above it. The root server may not know the answer, but it might know the server with authority over that domain, and tell the local server to query the authority. This is a primitive form of a peer-to-peer communication with a partially distributed control; however central control is there and can override the mapping for any domain. A great example of central control being used abusively to override domain names appeared recently [5].

2.1.3 UsenetUsenet implements a decentralized model of control and is considered the

grandfather of true peer-to-peer application [1]. Due to the fully decentralized control, no one person can govern what happens through the application’s network. Usenet was originally created to exchange files and messages between computers at the University of North Carolina and Duke University. The idea was that students could post messages, and students at either school could then read and reply to these messages. This task was originally automated by using the UUCP (Unix-to-Unix Copy protocol [6]), but later NNTP (Network News Transport Protocol [7]) was designed to be a dedicated service for such traffic.

The way the system works is that NNTP clients can subscribe to certain channels/groups where messages of similar topics would be sent. When a new message is posted to one of these groups the local server will keep a copy. Later, other NNTP servers may make requests to the local server asking if any new messages have been posted, if so, transfer these new postings. These messages will slowly make it around the NNTP network as more servers check for updates.

The control mechanisms on the network are interesting. The server a message originated from has permission to delete this message from the network by sending out a recall. Additionally the network allows the creation of new groups by a global election. A new group creation event is sent to a well known control group and is listed there for a certain length of time. During this time any user of the NNTP network can vote for its acceptance. If more positive votes are recorded than negative the group is created. This demonstrates a fully automated democracy.

2.2 Recent P2PUntil recent years no major user oriented applications have been made which

heavily use P2P ideas but this all changed in 1999. The program called Napster [8] was released starting a new wave of P2P protocols and applications. Since then the Napster idea has been improved leading on towards more advance styles of P2P networks and P2P being introduced to other areas of computing which Napster didn’t originally address.

2.2.1 NapsterThis was the first popular P2P network in recent years. Unfortunately it wasn’t

popular due to its technical abilities or because it addressed an important problem. It was popular because it provided millions of users with free music without the permission of the music owners. This attracted great attention from the media and caused a new consumer base for this kind of application.

- 11 of 61 -


Andrew Brampton

Napster was a closed system so any information on its inner workings had to be reversed engineered [9]. It would use a central server ran by Napster which all clients would have to use for all control tasks. When the peer joined the network it would authenticate with the central server and upload a list of shared music on the local machine. Other peers could then send search requests to the central server which would then return the names of any music that matched and the addresses of the peers sharing that music. The peer would then need to make one more request to the server asking for permission to transfer the file from the peer. If this request is accepted then the true P2P aspect of the system begins. The peer makes a direct TCP connection to the other peer and begins the file transfer in a simple sequential way. This is illustrated in Error: Reference source not found.

There were many technical problems encountered with this solution, namely saleability and reliability. All users on the network would have to connect directly to the central server, and this caused problems with bandwidth and with machine power. It was also a difficult challenge indexing a million users’ files and carrying out searches on this huge database.

The second problem was due to the single point of failure of the central server. This ultimately was the reason that Napster stopped working in 2001 when legal issues forced the service to be terminated.

2.2.2 Gnutella To overcome the

central organization in the late 1999’s a company named Nullsoft decided to develop a truly P2P application named Gnutella [10]. It would require no central authority of any kind, yet allow all the users to share and search for files on the entire network.

It worked by having peers relay all messages sent to them to all the peers they are connected to. The network messages would have a simple TTL (Time to Live) value to limit the distance a message would travel. To join the network a peer would need to know at least one peer on the network and connect to them. When the peer is connected they can make a query for more peers, and a list of peers will be returned, which in turn can be used to form a more strongly connected network.

- 12 of 61 -


Central Server

Peer A

Peer C

Peer B

Peer D

DB

>> F

ind

foo

>>

<< F

ound

On

peer

D <

<

>> May I have foo? >><< Here is foo <<

Key:>> Indicates direction of message

Figure 2.1 A simple Napster network

Peer B

Peer

Peer

Peer

PeerPeer

Peer Peer

Peer A

< got foo? Tell A <

> Peer B has >

< got foo? Tell A <

> Peer B has >

< got foo? Tell A <

> Peer B has >

< got fo

o? T

ell A

<

> got fo

o? T

ell A

>

< go

t foo? T

ell A <

> go

t foo? T

ell A >

< go

t foo

? Tell

A <

> got foo? Tell A >

> got fo

o? T

ell A

> > got foo? Tell A >

Start

Key:> Indicates direction of message

Figure 2.2 A search on a small Gnutella network

Andrew Brampton

If a peer searched for a file on the network, it would send a message to all its connected peers, which in turn re-sends this message with a decremented TTL value to their peers, and so on until the TTL is zero. This system was good in the way that you can quickly join a huge network by only knowing one peer, and that it was a completely decentralised network. It however had a few scalability problems of its own.

The main problem was the control traffic (the messages allowing a new peer to join, and for searches to be carried out) would become very significant when the number of peers on the network increased. Research carried out by a former employee of Napster, Jordan Ritter [11] explains this problem better. His paper goes on to explain that on a simple network where each peer is connected to 8 other peers, a simple search of 18 bytes would incur 1.8mb of traffic after 5 hops, 13mb after 6 hops, and a huge 91mb after 7. This is an exponential increase and affects the network drastically after a few 1000 users join. This problem can also be increased by the number of queries carried out at any one time on the network. With just 1,600 users online an estimated one query a second would be carried out, with each peer required to handle up to 1MBps. Error: Reference source not found illustrates a typical Gnutella network, and how a query would pass though the network.

These problems could be decreased if the network was made smaller or if the messages were restricted to only a few hops, however the network would be disjointed with one side of the network not knowing about the other side. This reveals another problem of Gnutella; content may be available on the network bit it is not reachable by all.

The final aspect of this protocol which hasn’t been addressed is the way that peers could bias or exploit the network unfairly giving them more resources than other peers. Since the network is completely decentralised and no clear network standards were enforced, users started to exploit the network [12] to make their searches more important, or for them to take far more than they ever give back to the network.

2.2.3 FasttrackThe most popular file

sharing protocol is Kazaa reaching more members online than any other P2P network. Kazaa as well as a few other programs such as iMesh, Grokster and the original Morpheus all used the Fasttrack protocol [13]. Their system was very similar to the Gnutella protocol yet the network was closed and encrypted, hence only a small amount of information is known about the protocol.

The difference with Fasttrack was how it would organise its peers into a star style structure as shown in Error: Reference source not found. Instead of having all peers on an equal basis you would have two types of peers, normal and super-nodes. Peers would only connect to a super-node, and then super-nodes would connect among themselves. When a peer carried out a search it would be sent to its super-node

- 13 of 61 -


Figure 2.3 A simple query via super-nodes on Fasttrack

Peer B

Peer

Super Node

Peer

Peer

PeerPeer

A

Start

Key:> Indicates direction of message

Super Node

Peer

Peer

Super Node

< got foo? Tell A

Peer B has >

< got foo? Tell A

Peer B has >

got f

oo?

Tell

A >

< go

t foo

? Te

ll A

got foo? Tell A

>

< got foo? T

ell A

Andrew Brampton

which would then relay the query to its connected super-nodes. It would not however relay the query to its nodes, because when the peer first connects the super-node would create a cache of the files stored by the peers thus allowing the super node to answer on the behalf of its peers. This difference would reduce the network traffic drastically. With an estimated 100 peers to each super node the network was able to scale a lot better.

With the addition of a super-node model there was a need for a new type of co-ordination between the peers to decide when a new super node was required. Information on how this works isn’t publicly available, but one possible approach is that when a super-node thinks it is too over-crowded it will promote one of its peers to a super-node, and redirect some of the load to the new super-node.

Even with these improvements, Fasttrack still had a few problems namely privacy and users abusing the network. This privacy problem hasn’t been discussed but it affects all the protocols mentioned so far. When a user carries out a search every peer on the network can see it. This may be fine, however recent social and legal factors have forced developers to create a system where everything is anonymous.

2.2.4 Gnutella2The Gnutella2 [14] name is much of a “buzz-word” and hasn’t provided any

significant improvements to topic of P2P file sharing. It implements the Fasttrack super-node concept, but calls them hub nodes, and normal peers would be leaf nodes. There is a slight difference in that each peer may connect to more than one hub node to improve reach-ability.

The only major improvement is in the routing of search messages. Each hub node will keep a cache of search requests in the form of a QHT (Query Hash Table). This is a table of queries carried out on the network with their results. This hash table is then transferred among the hub nodes allowing for new routing features such as filtering and forwarding. If a hub node knows that sending a search query to a neighbouring hub will return zero results it will filter this request and not send it. If it already knows the result to the query it will send the replies on behalf of its neighbours.

The problem of users abusing the network has not been addressed however the Gnutella2 developers believe that when the network is fully operational there will be no need to exploit or abuse the network since it will work quickly and effectively for all users. This kind of social security is poor at best.

2.2.5 FreeNetFreeNet is an interesting protocol in the way that it provides full anonymity of

your actions on the network. It is described as; “an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity of both authors and readers.” [15]

It is very similar to a Gnutella network, but designed to act very much like a file system. No searching of data can be carried out; instead all data must be referenced directly by its name (in this case a 160bit SHA1 hash of the file). When a file is added to the network, parts of it are sent around the network without the peers knowing what the data is, or who first placed it there. Later when someone requests this file, it will be retrieved from any peers with pieces of the data without the requester knowing which peer is sending them the data. This all works by relaying encrypted messages throughout the network with very little direct peer to peer connections.

- 14 of 61 -


Andrew Brampton

Whilst this provides very affective anonymity it causes the network to function very slowly, with the requested data being sent via many peers before it reaches its destination. This can also place a strain on the peer’s bandwidth having to re-transmit data passing by.

2.2.6 Distributed.netUntil now only P2P networks designed for sharing files have been discussed,

but this isn’t the only computing resource that can be used. Distributed.net was founded in 1997 and currently has 44,000 participants in a global P2P network for sharing their computer’s processing power [16]. The goal currently is to crack a RC5 72bit encrypted message which with a normal computer would take an astronomical length of time in the order of millions of years. When the computers are connected in a huge Distributed.net network this length of time is cut down to a more reasonable length. Current estimates are 1220 years before it is achieved, but this is still many orders of magnitude better than millions.

Technically the system is very simple. Peers connect to a central server which distributes blocks of data for analysis. The peer then runs analysis against this data and returns the results to the central server. Since the analysis can be very time intensive the peer will only request new data once a day allowing for this system to handle many million concurrent clients.

2.2.7 SkyPeSkyPe is another non-file sharing P2P network that instead is a Voice over IP

(VoIP) [17] implementation. This is a system that allows you to speak to other users over the internet similar to speaking over a phone. It uses very few P2P concepts other than a simple Gnutella network and the ability to route an encrypted voice conversation via peers [18]. In this kind of situation clients want the best connection between two users, and this is usually achieved by a direct TCP connection, however SkyPe realises that some users are firewalled and direct connections aren’t always possible, thus allows conversations to be past across the P2P network in the most optimum way. It also boasts the ability for a conversation to go many routes to help improve performance.

2.2.8 BittorrentThe most recent P2P

application to become popular and which this project is heavily influenced by. The protocol is very simple, there is no searching or advertising of files on the network, each Bittorrent network is only for a preset group of files. Therefore everyone on the network is trying to download the same thing. The system is broken into two parts, a Tracker and Peers. For each network there is one tracker whose job it is to keep a list of peers on the network and send this peer list to any peer that may request it. This central

- 15 of 61 -


Tracker

Peer

Peer

Peer

Peer

PeerPeer

Figure 2.4 A BitTorrent network

Andrew Brampton

authority makes it very easy to track the users on the network, and stops problems of reach-ability experienced in other P2P networks. This is illustrated in Error:Reference source not found.

Now that the peers can find the address of all other peers they can make TCP connections to as many peers as they deem necessary. By default BitTorrent will connect to the majority of know peers because it has been shown that randomly constructed graphs with a large out degree can be very robust and stable for such a style network [19, 20].

There are two types of traffic between connected peers, control traffic and data traffic. The set of files which this network is downloading is split into a predetermined number of pieces, and each peer will keep a record of which pieces it currently has. On each new connection a list of completed pieces are exchanged between the two peers, and now the peer may request a piece it knows its neighbour has.

When a new piece is downloaded by a peer they inform all their neighbours to the completion. Eventually the peer will have all the pieces and turn into a seeder which just shares. Since all the control communication between peers is comparatively small this part of the system can scale well. Also since the peer can start sharing the pieces it has as soon as it has downloaded at least one the network can quickly distribute the load among itself. This does allow for an extremely quick download.

The problems however are centred on the tracker. The tracker is a single point of failure and a bottleneck. If peers can’t connect to a tracker then they can’t get information about other peers on the network, thus never be able to start the download. There are a few solutions to this problem mostly involved with running more than one tracker and DNS load balancing them.

2.3 Streaming TechnologiesStreaming is the concept of sending continuous time dependant media over a

network. This media can be created on the fly from recording equipment for example, or it could be streamed from a stored medium. The paradigm is most similar to common radio broadcast, however over the internet this concept is confused due to the point to point link infrastructure of the internet. These point to point links make it very hard to conduct any kind of broadcast, and make it expensive to broadcasters to recreate a broadcast concept.

There are two main approaches to adding broadcast functionally to the internet. The most suitable would be to change the structure of the internet; this however is not practical in the real world. The second solution would be to create virtual overlay networks which adds the broadcast ability, but at a cost.

2.3.1 MulticastThis is the ideal streaming technology for the internet, which was developed in

1989 at Stanford University and is documented in RFC 1112 [21]. Multicast is the concept of sending IP packets to a group of IP addresses which have joined the group. The system can be highly dynamic with hosts joining and leaving constantly. The obvious advantage is the source only needs to send one packet to a predefined group address and all hosts in the group will receive it. This system is possible within a network if the routers are multicast enabled. However deployment of network-layer multicast has not been widely adopted by most ISPs [22] due to commercial reasons, thus only a very small number of internet hosts are multicast enabled.

- 16 of 61 -


Andrew Brampton

To overcome this problem research has been carried out into application/overlay layer multicast protocols [23] with varying degrees of success. The main problems are generally keeping the protocol overhead small, and maintaining a high level of service. The simplest and most widely implemented and used streaming technology are simple Server/Client models, where a server, or cluster of servers send the stream to many clients using the server’s own bandwidth, this can prove problematic if the bandwidth required for number of clients isn’t available. This may be solved by giving some of the bandwidth load to the clients, allowing them to distribute the stream to fellow peers.

2.3.2 Batch ChainingThis concept was developed to

solve the Video on Demand proposals where any client can request any stream at any time. This kind of activity can place huge demands on a video server. Instead the system proposed by a paper from the University of Central Florida [24] improves the network in two ways.

Firstly it batches clients that request the same stream at a similar time together. The result is that the server sends to one peer in the batch who distributes the stream to its siblings in the batch. The problem with this is the first person to create a batch will have to wait a period of time before the last person has joined, thus causing delay for the first user. A typical period of time would be in orders of 10 minutes.

The second improvement is to place adjacent batches next to each other in a chain. Once the earliest batch has finished reading the stream, it passes its cached data on to the next batch in the chain. If there aren’t any adjacent batches the server will create two separate chains. This can be seen in Error: Reference source not found with each batch receiving a delayed segment of the stream from the previous batch.

There are however limitations to this approach, mainly involving reliability and trust. If the batch disconnects from the chain then any batches after it in the chain will have their streams disrupted and will need to reconnect to the source since this will be the only peer with the correct segment of the stream. Secondly you will be receiving all your content from another batch; a peer in that batch may be corrupting the system which could be devastating further down the chain.

2.3.3 NICENICE [22] is a single-source

media streaming protocol developed at the University of Maryland. It organises the peers into a tree

- 17 of 61 -


Figure 2.5 Batch Chaining Technique

PeerPeerPeerSource

Peer

Batch 1 / Chain 1

Batch 2 / Chain 1

PeerPeer

Batch 3 / Chain 1

PeerPeerPeer

Batch 1 / Chain 2

Peer Peer

Source

Peer

Group 1

PeerPeer PeerPeerPeer

PeerPeerPeer

Group 3

PeerPeerPeer

Group 2

……………..….....

Group 4

PeerPeerPeer

Group X

Figure 2.6 A NICE tree network

Andrew Brampton

structure rooted at the source server. It was designed to help distribute live continuous media quickly and effectively with low overheads.

This solution improves on the chaining technique by organising the peers into a tree instead of a simple chain thus increasing reliability and scalability. It works by creating groups (or batches) of peers, with these groups being connected into a tree hierarchy with one peer being nominated the head of the group making the above connection. The non-heads would connect to other groups below them in the tree. This is illustrated in Error: Reference source not found.

There is still a problem with reliability, for example a whole branch would be disconnected if the root group left the tree. This is catered for by a quick recovery control protocol.

2.3.4 ZIGZAGZigzag [25] developed at the University of Central Florida was heavily

influenced by the NICE approach. The difference being that the path of data through the tree has been slightly modified to allow for faster recovery and less control traffic. It would still use the same node degree but increase the link degree. Each peer in a group would be connected to their parent node. These additional connections would only be used for reliability if the main link fails. Additionally these new links could allow the peers to be designed into more than one tree. One purpose for this is to create a different tree for data, and a different tree for control traffic.

2.4 Recent ResearchMost widely used P2P networks were designed outside of the research

community and as such they have problems with security, efficiency but mostly with scalability. This section hopes to discuss many new P2P ideas which have been researched in the past few years but have yet to be adopted.

2.4.1 PastryPastry [26] is an extendable peer-to-peer overlay network designed at

Microsoft Research and Rice University. The idea was to create a fully decentralised network which would allow a numerous number of different applications to be running on top. The protocol implements a very scalable and efficient routing algorithm to provide application level routing which uses very little bandwidth, and guarantees all peers can be reached within Log2

b(N) hops where N is the number of peers in the network and b is a routing parameter commonly set as 4.

It works by assigning every peer a random number inside a 128bit range called a key, and then when messages are sent within the network they are sent to a specific key allowing the network to implement their own routing algorithm based on this key. When any peer receives (or sends a new) message it has to choose which peer to forward to. It does this by using an internally kept routing table. Unlike the link-state and distance vector methods the peer only keeps data of peers near it instead of a global overview of the network. This allows for much smaller routing tables and for less routing traffic, however the routing of messages takes slightly more hops, but this is a reasonable trade-off. This concept is based on work by Plaxton et al [27].

When forwarding a message to any Key the peer will look for the numerically closest match in its routing table and forward to that peer via the underlying network (for example IP). Eventually the message will get to a peer numerically closest to the

- 18 of 61 -


Andrew Brampton

destination and if that peer doesn’t have destination in their routing table then the message is undeliverable.

To guarantee that numerically close peers are always listed in each other’s routing tables special conditions have to be taken when a peer joins. This adds a little overhead to the join, however it very quickly and effectively adds the peer to the network for everyone to access.

There are however two problems with the routing implementation which can occur in rare conditions. With the correct number of fails it is possible for the network to partition and not reconnect, thus causing two isolated networks. However if just one peer is on both networks the networks will re-join in a short time with low effort. There is the previous un-discussed problem of peers being unreachable when very rare race conditions occur under a series of failures and joins on the network. For example if 3 peers exist with keys 10, 20, 30 and a forth peer wanting to forward a packet to peer 10, however only peer 20 is in its routing table. Now if peer 20 fails the packet could be forwarded to peer 30, however due to the order the peers joined peer 30 also doesn’t know about peer 10, and therefore can’t send the packet anywhere, and thus blacks out peer 10. This problem is solved by increasing the b variable and isn’t considered a problem when the network is of large enough size.

One last notable addition to the Pastry protocol is the ability of network locality in the sense that Pastry gives nodes which are geographically close preference over nodes further away. This metric is determined by the application but could be based on IP Hops between hosts, or Latency times. It was discussed by Savage et al [28] that selecting geographically local hosts for routing may only be more effective 30-80% of the time. It was also discussed that triangle inequality won’t hold on the internet, causing the reverse path to be different to the forward path, however Pastry theory assumes triangle inequality holds.

2.4.2 SplitStreamSplitStream [29] is an application developed by Microsoft Research to operate

on top of Pastry. It is a multicast overlay network used for streaming media by constructing many balanced multicast trees where each peer can be a member of one or more trees. In a normal multicast tree with a node out degree of 2 over 50% of the nodes are leaf nodes which are not contributing anything to the network. In a tree with a 16 node more than 90% are leaf nodes.

SplitStream tackles this problem by placing nodes in more than one tree, causing the node to be both a leaf and an interior node. On each tree a different segment of the stream is sent, meaning the stream must be split into a specified number of segments at the server. SplitStream presumes the senders will be using algorithms with Multiple Description Video properties allowing the video quality to drop when a node is not connected to all trees. This is useful when peers have varying amounts of bandwidth and can sacrifice video quality for bandwidth, but for those peers with higher bandwidth available they can subscribe to more trees and receive higher quality.

Using MDC does limit what continuous media can be sent across the network and allows the protocol to claim robustness. However when the network is delivering media that can’t recover from loss the system can become very unreliable. The time it takes a peer to (re)join the network is in the order of LogO N where O is the node out degree and N is the number of nodes on the network.

- 19 of 61 -


Andrew Brampton

SplitStream also suffers from the same problems as NICE [22] such as branches of the multicast tree being severed with large numbers of peers losing part of their stream.

2.4.3 ChordChord is a decentralized lookup service that stores key/value pairs throughout

the network. It works on a very similar approach to Pastry however Chord uses a less efficient routing algorithm with order Log N where N is the number of peers, whereas Pastry uses Log2

b(N) where b is the node out degree and N is the number of peers.The routing algorithm works very similar, with each peer being assigned a

random id, however instead of the message being forward to the peer with the closest matching key; it is forwarded to the peer with its number closest to a power of 2. For example, if peer one sent a message it would be forwarded to peer 2, 4, 8, 16 or etc depending on which power was nearest to the destination.

Chord also adds redundancy to that network, such that the data is stored on at least one node thus meaning more than one peer must disconnect before any data is lost. Pastry however stores all data on only one node with surrounding nodes being able to also store this data but is not guaranteed. What Chord makes up in redundancy it loses in scalability and protocol overheads. In theory it is still true that Chord can scale to millions of hosts; however Pastry can scale more easily due to the more efficient routing algorithm.

2.5 SummaryThis chapter has successfully explained the history of P2P from the very first

ARPANET to the cutting edge research such as SplitStream, and Chord. At each step along this history the pros and cons have been made aware and discussed. Also discussed have been the current advances in streaming technologies, mainly with the focus of P2P Streaming.

The next chapter will now build upon these research ideas allowing a new concept to be designed which will hopefully avoid the pitfalls of previous projects.

- 20 of 61 -


Andrew Brampton

3 DesignThis chapter will give a high level view of the key components in the system

and then follow on to explain the design of the protocol used between these components. Each component will be explained and any algorithms of technical value will be discussed. The inner workings of the system components will be presented in UML diagrams with discussion of the main classes. The last sections will discuss testing and evaluation strategies for the proposed system.

3.1 RequirementsA list of requirements has been drawn up from the research carried out in the

Background Reading Chapter [see Chapter 2]. The requirements aim to build on any negative aspects of current systems and to add functionality. Each requirement will be explained with a brief description of how it was derived.

3.1.1 Provide a robust networkThe network should be fault tolerant and be able to survive peer failures

without the receiving nodes suffering from interruptions.Since the system will deal with the timely delivery of continuous media it is a

reasonable feature to include to make sure the data gets to the destination on time and that peers leaving or joining the network do not interfere with this timely delivery. In current streaming technologies like Batch Chaining and NICE [see sections 2.3.2 and 2.3.3] peers will not receive data on time if the peer’s upstream experience problems. A strongly connected graph of peers would solve this problem and will be used by this system.

3.1.2 Allow quick re-join after peer failureIf a node does fail the peer or any peers affected should be able to rejoin the

network with minimal effort and without loss of service.As described in the previous section, robustness is a requirement to ensure

timely delivery as such any peer failures or joins should not adversely affect the network. ZIGZAG [see section 2.3.4] demonstrates a P2P network with fast recovery.

3.1.3 Stream data with low control overheadThe control traffic for constructing and maintaining the network should be as

small as possible. Traffic about the stream should also be kept to a minimal so the peer can use most of their bandwidth to receive the stream data.

In networks such as Gnutella [see section 2.2.2] it was demonstrated that as the network grew, the control overheads to maintain the functionality of the network increased exponentially. It is therefore a reasonable requirement to request that the control traffic is low.

3.1.4 Move the stream distribution load away from the sourceThe source of the stream should only require a small amount of upload

bandwidth, with most of the forward load being placed on peers within the network.Due to technical and financial reasons, Batch chaining [see section 2.3.2] was

developed in an effort to move the distribution load away from the source. This change would greatly benefit the source and other peers on the network; as such any protocol that wishes to be greatly useable should also exhibit this property.

- 21 of 61 -


Andrew Brampton

3.1.5 Be scalableAll the requirements above should not hinder the scalability of the network

and as such should allow large numbers of peers to be on the network. The more peers in the network should not exponentially increase control traffic or source server load, in fact, if possible, no increases should be observed.

It was shown with DNS and Usenet [see sections 2.1.2 and 2.1.3] that scalability will aid in the successful adoption of the protocol. It is also noteworthy to mention the limited success of Napster and Gnutella [see sections 2.2.1 and 2.2.2], was partly because of scalability issues once their user bases became large enough, causing both systems to break down.

3.1.6 Media agnosticThe protocol designed must be generic enough to stream any kind of

continuous media, be it Audio, Video or even ticker style text.This requirement doesn’t have a clear derivation, but instead should be

included to add greater flexibility and usefulness of the protocol.

3.1.7 Be secureThe network should be secure from tampering of the data and from un-

authorised users receiving the stream. However, the scope of this project does not cover security but the protocol should be coded in a way to allow this in the future.

A very large and growing area of computing is security and any protocol which doesn’t provide a prevision of security will never be globally accepted.

3.2 Peer-to-Peer NetworkAs described in the Background Reading chapter there are many types of

existing P2P network models, and a few P2P Streaming models. This system will implement a strongly connected graph of peers’ style network adapted to the content of streaming by providing a time indexed continuous media. The choice of a this network is so;

That each peer will be able to find many other peers quickly and efficiently thus satisfying requirement 1.2.

The large number of connections between peers will hopefully allow data to quickly spread throughout the network. This will also make the network more robust by providing many sources of the stream, helping to satisfying both requirement 1.1 and 1.4.

There will be low organisational overhead between peers because of the peer location services provided by the tracker, thus saving their bandwidth for the content, and aiding in requirement 1.3.

This idea is a hybrid of the BitTorrent idea, however it will need modifying; the major change is instead of having a fixed number of pieces a continually increasing number will be used, with old pieces expiring from the network. BitTorrent also announces to other peers when a piece has been downloaded. This concept will be maintained.

- 22 of 61 -


Andrew Brampton

The system will comprise of three main components explained here and depicted in Error: Reference source not found. The figure also shows the flow of data between the components with the thickness of the line representing how well connected the components are. Additionally the strongly connected nodes would be transferring more data than the weakly connected.

Tracker - The tracker is the peer coordinator. It will store a list of all peers on the network, and allow connecting peers to quickly find other peers to join. It will not handle any data concerned with the stream. It is simply there for peer discovery.

Peer – This is one of the nodes in the network which will download and re-send parts of the stream.

Source Peer - This is logically the same as another peer, however this is the source of the stream. Peers on the network will not know that the stream originates from this peer, and there will be no bias because of that. The source peer may also reside on the same machine as the tracker however this is not a requirement. This is shown in figure 3.1 by the box surrounding both source peer and tracker.

3.3 Stream RepresentationThe logical representation of the

stream will be a sequential list of numbers with the stream starting at 0, reaching an arbitrary maximum and wrapping back to 0. Each integer number represents a fixed size block of bytes from the stream, these blocks will hereafter be named pieces. At any time any single peer may have a small non-continuous set of the total stream, however eventually the peer will have a continuous set of pieces allowing correct playback. It

- 23 of 61 -

Figure 3.1 Diagram of Peers, Source Peer and Tracker

Tracker

Peer

Source Peer

Peer

Peer

PeerPeer

Same Physical Location

Weak Link (low bandwidth)Strong Link (high bandwidth)


Figure 3.2 Representation of a stream

0 21 3 4 5 6 7 … n

x Represents 512bytes

Represents n × 512bytes

Andrew Brampton

is also advised that the peer cache a number of these pieces for a limited amount of time so that they can be sent to other peers.

It may also be advisable for the client to download the stream in sequential order however it is not essential and may improve performance if a set of pieces are downloaded concurrently in advance.

3.4 TrackerA major problem with fully distributed networks is locating peers with the

information. This is normally attributed to a high degree of network partitioning or low network reach-ability [see section 2.2.2]. To solve this problem a dedicated server designed for the sole purpose of tracking which peers are listening to the same streams will be designed.

In a BitTorrent style network there is a single central server called a tracker which stores a list of peers on the network. The reason for the tracker is to lower control overheads, and to limit the effects of network partitioning, thus fulfilling requirements 1.2 and 1.3. In the network there will only ever be a single source of the stream. It therefore makes sense for that peer to also run the tracker since only one is needed. Enforcing the use of a tracker does hinder the ability of requirement 1.1 and 1.5; however the affects will be acceptable.

There are a few slightly different solutions for the tracker protocol. A custom protocol could be written using a stateless protocol; however the choice made here is to go with a connection orientated protocol.

The tracker will be designed as a HTTP extension, allowing for dedicated web servers to be written as trackers, or a web application/script tracker to be written. HTTP is great for this task due to it being widely used online, and for the stateless, and infrequent properties of the connections.

The only problem with the HTTP protocol is that it adds a little extra traffic overhead; however this is acceptable since it will be used infrequency, and compared to data sizes the overhead is negligible.

3.5 Tracker-less NetworkThe current design

will use a central server tracker for the network; however a future extension could use a fully distributed tracker. The reason the current design won’t adopt this approach is so the project can focus on the streaming technology. The benefits of a tracker-less approach are numerous. One major benefit would be that the network is more scaleable and therefore, fewer resources would be needed by the stream’s author.

The concept would be built upon a Pastry [see section 2.4.1] network. All the nodes using this protocol would all be connected to this wide P2P tracker network regardless of which stream they are listening. The properties of a Pastry network give each peer a unique 128bit identifier. Additionally each stream would need to be

- 24 of 61 -


Figure 3.3 A tracker-less network

1 2 5 10 11 14 17

19 22 24 26 27 35 38

41 45 48 51 53 56 59

59 60 61 63 65 66 69

78 80 85 89 90 95 97

Andrew Brampton

assigned a unique 128bit ID. When connecting to a network a peer would send a message to the peer most closely matching the id of the stream. This peer would be nominated to provide tracker functions for the specific stream. Each hop to the nominated peer from any interested peer would record that this peer is now listening to the stream. The closer the message gets to the peer with the stream id, the larger the list of peers for that stream will be on each host. Each peer would be required to keep a list of 10 peer IDs which gets cycled when new peers join. Later when a peer wants to retrieve a list of peers they can send out requests with increasing time to live fields until they have received as many peers as they need.

As shown in Error: Reference source not found two peers, 35 and 60, are designated as the trackers for two different streams. The thickness of the lines around the peers indicates how full their peer tables are, with each peer around the trackers storing a percentage of the list of peers using that stream. The more peers listening to that stream the greater radius of data is generated. Take, for example, peer 2 tries routing data to peer 60 asking for peers. Its message would go via 22, 45 and then 60. Each of those intermediate hops knows a limited amount of knowledge. Peer 2 can query each of these progressively until it has obtained a large enough list.

It can be seen that this concept may be impractical for networks where by bad luck the peer with the same stream ID happens to be a low bandwidth user who isn’t able to fulfil all requests. Also it would be trivial for a malicious host to assign themselves the stream id and partially corrupt the peer table. Hopefully both these problems will be less important as the size of the network grows because the number of peers holding the peer table will increase.

3.6 PeerThe peer will require the most amount of design. In the BitTorrent concept

peers will first need to connect to the tracker to receive a list of peers via normal HTTP communication. The peer can then connect to as many other peers listed as deemed necessary. The peer communication protocol will be reliable connection orientated protocol (such as TCP) and be designed as small as possible to help with requirement 1.3. A stateless protocol (such as UDP) wasn’t chosen due to the lack of reliability and stateless nature.

There are two fundamental ways the peer protocol could work. The peer could announce that a piece has arrived, or peers could query other peers for their piece set. Standard P2P file sharing networks work by querying for pieces, however this concept doesn’t work well in a streaming environment. If a query was sent after each new piece was downloaded, another message would be sent in reply confirming or denying if the peer has that piece, therefore requiring twice the bandwidth. Also as soon as a remote peer does have a new piece the peer won’t know until its next query. This provides a problem when timely deliver is a requirement where it would be critical for the peer to get that piece on time.

Since an announcement protocol will be used the overhead for that packet must be small since a large number of them will be sent. To also improve performance, announcements will only be sent to peers that haven’t previously announced themselves from having that piece.

3.7 Source PeerThis will appear like any other peer, however it will never be required to

download pieces since it will be the source of the stream. The stream will be read from a file, recording equipment, or another suitable IO device. It would be beneficial

- 25 of 61 -


Andrew Brampton

for the source to store the last x number of pieces to help spread them on the network. If the source expires the pieces too quickly, peers may miss that piece and it would never make it onto the network.

3.8 Peer and Tracker OverviewThis section will discuss how the tracker and peers communicate between

themselves. Details will be given in later sections, but an overview is displayed in Error: Reference source not found.

The peer first connects to a tracker who manages the stream the peer is interested in. This will result in either an OK or an error. If an error occurs, the peer has no choice but to stop and deal with the error by either prompting the users or dealing with it internally. Following this, a list of peers will be requested from the tracker giving the client a subset of all the users connected to the network.

Now that the client knows some peers on the network it can make individual connections to each peer, carrying out a handshake and then becoming connected. Next the client will wait for announcements from other peers. Each announcement will inform the client of newly available pieces on the peers allowing the requesting to begin. The client will download the oldest (smallest numbered) piece and start to pre-cache a few pieces ahead of time. Once a piece has been requested and downloaded the client will announce to all its peers about the completion.

This process will continue while the client is playing back the media. The client should keep and request a set of pieces before and after the current playback

- 26 of 61 -

Tracker Peer A Peer B

Join Network

Connect To Peer B

OK

List Peers

Address Of Peers

Peer n

{Unknown LengthOf Time}

Connect To Peer n

Announce Piece Done

Request Piece

Transmit Piece

{Transmit Time}

Announce Piece Done

Figure 3.4 UML Sequence diagram of Peer and Tracker interactions


Andrew Brampton

location. The reason for the advance pieces is for pre-caching in case the stream is lost for a period of time. The reason for the older pieces is so that they can be shared with the network for a given amount of time.

3.9 Tracker ProtocolThe tracker is responsible for holding the list of active peers, and sending this

peer list to interested peers. It is also responsible for holding meta-information about the stream.

All information is transferred via normal HTTP protocol [RFC 2616] with a well known URI [RFC 2396] describing the location of the stream. The peer will send standard GET requests to this URI with differing query strings to determine what action the peer is taking. All data transferred in the URL will adhere to URL encoding specifications.

An example of a valid URL would be:

http://tracker.com/?action=join&peer-id=ABCDEFGHIJKLMNOPQRST&peer-port=4321

This would be requesting to join the network, with peer id A-T, with port 4321 listening for incoming connections. A full list of all the valid query parameters follows;

3.9.1 &peer-id=This query field is required with all HTTP Requests to uniquely identify the

client to the tracker. The ID will consist of a random peer selected 20 byte string. This ID will be used to identify the peer in the future and should not be revealed to other peers. The ID will be recorded upon joining the stream but on all other commands it shall be used to confirm the identity of the peer and if the ID is incorrect, the command shall be ignored.

3.9.2 &peer-ip=The IP address the peer believes it is listening on. The IP/Port can not be used

as a unique identifier since more than one listener may be on the same IP or behind the same NAT gateway.

3.9.3 &peer-port=The TCP port that the peer is listening on for incoming connections. The

IP/Port cannot be used as a unique identifier since more than one listener may be on the same IP or behind the same NAT gateway.

3.9.4 /?action=joinThis is sent when the peer first wants to connect to the stream. The HTTP

Body will contain stream specific data which should be used by the peer to understand the format of the stream data. For example, the header of an ogg stream would be sent so that the peer can pick up the stream from any position. HTTP Headers will also be sent explaining application specific details. X-BitStream-PieceSize and X-BitStream-ContentType are both required.

From this point the peer will be listed by the tracker as actively subscribing to the stream allowing other peers to connect to it.

- 27 of 61 -


Andrew Brampton

3.9.5 /?action=partThis is sent when the peer decides to stop listening to the stream. The tracker

should remove the peer from the list and free any memory about the peer. The peer-id must be included to make sure the correct peer is removed.

3.9.6 /?action=listThe peer will periodically request this to gain a list of peers. The peer can also

request this list when it needs more peers to connect to. The normal interval for the peer to request this list should be every 5minutes. If the peer does not keep to this interval the tracker should assume the peer has unintentionally disconnected from the stream and should be removed from the list.

3.9.7 HTTP HeadersThe following are the headers which can be sent upon join.

3.9.8 X-BitStream-PartSizeA required field indicating the size in bytes of each piece in the stream. Setting

this to a lower value causes more peer control traffic but allows for less delay in the stream.

3.9.9 X-BitStream-ContentTypeRequired content type of the stream data, for example application/ogg

[rfc3534], video/mpeg etc.

3.9.10 X-BitStream-TitleAn optional title of the stream.

3.10 Peer ProtocolInformation between peers consists of control traffic such as requests and

announcements, and actual media stream data. The stream is divided up into different fixed size pieces. Each piece has an integer index with the starting index depending on how far into the stream the current source is. When a peer acquires a new piece it should announce to all connected peers. Peers may optimistically delay announcements to save bandwidth. Peers may also batch announcements together to lower overheads.

3.10.1 PacketsMessages sent between peers will be using a

custom protocol via TCP. The TCP connections must be able to send data in both directions allowing peers behind NATs and other such firewalls to operate normally. Each packet will have header and then the packet body, this is illustrated simply in Error:Reference source not found. The messages are designed in such a way that if a peer doesn’t understand or implement that type of message they may skip and receive the next message. This is to ensure maximum backwards compatibility.

- 28 of 61 -


IP

TCP

Packet Header

Body

Figure 3.5 Packet Diagram

Andrew Brampton

3.10.2 Packet HeaderPre-pended to every message sent out, it is used to help identify the content of

the message, and allows for backwards compatibly by specifying the packet length allowing clients to skip packets they do not understand.

0 1 2 3LENGTH

TYPE DATA…

LENGTH Length of the packet in bytes excluding the length fieldTYPE A one byte code to explain what is contained in the data fieldDATA A variable length field containing data stored as the type

explained

Current possible data types;0 – Keep Alive1 – Handshake2 – Announcement3 – Request4 – Error5 – Data

3.10.3 Keep AliveThis data type has no contents and is used to keep the connection alive. The

length field for this packet must be 1

3.10.4 HandshakeThis is sent once at the beginning of a connection. It allows each client to

know what each other is capable of. Both parties must receive this message before any other message can be sent. Once the message has been received both parties will work using the lower major version, or error and disconnect.

0 1 2 3VERSION MAJOR VERSION MINOR

NAME…

VERSION MAJOR Major version of the app, at time of writing only 1 is allowed and all data sent should conform to this specification

VERSION MINOR Minor version of the app. Application are allowed to place any number in this field

NAME Variable length field containing the name of the client

3.10.5 AnnouncementThis message is used to advertise to other peers what parts of the stream this

peer is sharing. The client should always announce the entire stream they have unless a sharing algorithm is being used to help distribute the stream more efficiently. Such algorithms are discussed in section 3.12.

- 29 of 61 -


Andrew Brampton

0 1 2 3INDEX START

BITFIELD…

INDEX START The index represented by the first bit in the bitfield. This is an integer value incremented by the source peer on each new piece.

BITFIELD A variable length array of bits. Each bit represents an index one newer than the previous bit. The bit is set depending on whether the peer has that piece of the stream.

3.10.6 RequestThis is sent when a client requires a piece of the stream from another client.

The first and last indexes are sent asking for all pieces in the range of first ≤ x < last. The client must have announced all pieces between start and end before a request can be made.

0 1 2 3INDEX START

INDEX END

INDEX START The index of the first requested part INDEX END The index of the last requested part.

3.10.7 DataThis can be sent after a request is made, or if a pre-emptive algorithm is being

used to help distribute the data discussed in section 3.12.3.

0 1 2 3INDEX START

DATA…

INDEX START The beginning index of this data. DATA Raw data of the stream starting at INDEX START.

3.11 Program DesignThe design is split into many classes, all designed to supply a specific task

with abstract classes hiding the inner workings. The UML shown in Error: Referencesource not found shows a possible configuration of classes for a peer. Each major class will be explained in turn.

- 30 of 61 -


Andrew Brampton

The program will be coded in an Object Orientated (OO) language, namely C++. There are numerous reasons to code in C++, the main ones are the speeds and abilities of C, plus the Object Orientated aspects allowing for a very modular design and code reuse. The design will make use of a lot of OO concepts such as inheritance and interfaces to allow the clients to be flexible in the media they transfer and in the way they do it.

3.11.1 PeerClientThis is a very simple class which contains the main method and the starting

point of the program. It will parse any user input and create the correct classes depending on the features required by the client.

3.11.2 StreamBufferInterfaceA key class which stores the pieces sent and received. It should be created

with two parameters setting the number of pieces it holds and the size of each piece. It will implement methods such as Read(), Write() and Peek() allowing it to randomly read and write to any stored piece and to read from the buffer as if it was a stream. If it’s not possible to read sequentially due to a required piece missing the object should block until the piece has been downloaded or after a given time the function returns

- 31 of 61 -

+main()

PeerClient

+PeerManager(in DataSource : StreamBuffer)+Close()+Connect()+Listen()+isListening()-FindNextPacket()

-PeerList : PeerConnection

PeerManager

+Open()+Close()+GetPieceMap()+RequestPiece()-SendPacket()

PeerConnection

StreamBuffer

+StreamBuffer(in elements : int, in elementSize : int)+Read()+Write()+Peek()

«interface»StreamBufferInterface

Provides a threadsafe access to a stream of data

A Source would use a FileStreamBuffer,A Normal peer would use a StreamBuffer

FileStreamBuffer

1

*

1 1

1

1

Everything starts here

+Read(in Data : char) : PeerPacket

«utility»PeerPacketFactory

+PlaybackInterface(in DataSource : StreamBufferInterface)

«interface»PlaybackInterface

1 1

1

1

OggPlayback FileWriterPlayback VideoPlayback These classes readfrom the datasourceand play to a IO device, eg Screen,Speakers, File, etc

This provides the TCP connection to the remotepeer.

The PeerManager can querywhat pieces this peer may have, and issue requests out through this peer.

inout EventType

«signal»PeerEvent

«uses»

A PeerConnection can notifyits PeerManager of events occuring by sending it a PeerEvent object1

«interface»PeerPacket

A PeerPacket is a representationof the byte data sent over theconnection. This may be incomingor outgoing data. The PeerPacketFactory will generate PeerPackets from incoming data

Figure 3.6 UML Diagram of different classes within the system


Andrew Brampton

an error which the Playback device should deal with. The two main implementations and a possible third are listed here;StreamBuffer A simple random access buffer designed to store pieces which

are randomly inserted, and allow stream like reading from this buffer.

FileStreamBuffer A read only buffer which is created from a file. This type of buffer would be useful for a source peer.

AudioStreamBuffer Another read only buffer which would pull the stream from a sound card or other input device.

3.11.3 PlaybackInterfaceBecause the protocol is media agnostic, any type of media can in theory be

played back, for this reason an Interface class was created with common methods such as Play(), Stop(). Three possible implementations of this class are;OggPlayback Decodes the media as an Ogg Vorbis2 audiostream.FileWriterPlayback Writes the stream directly to file.VideoPlayback Decodes the stream as a Video stream encoded with MPEG or

similar codec.All these classes would be created with an instance of a StreamBufferInterface

being passed in. In most common cases a StreamBuffer would be passed in, however for debugging or testing purposes a FileStreamBuffer can be used. The playback will then occur from this buffer.

3.11.4 PeerConnectionThis class is a logical representation of a connection to a remote peer. It will

deal with all the network connections including packet sending/receiving and provide a high level view of the status of the remote peer. It exhibits functions such as Open(), Close() connections, SendPacket(), RequestPiece() and GetPieceMap() which will return a bit map of pieces the remote host has.

3.11.5 PeerManagerThis class is where the majority of important algorithms will go. It will be

designed to be a coordinator of all the PeerConnections. Internally it will store a list of all Peers with their PeerConnections. Decisions will take place to decide which pieces to download, which peers to connect to, which peers to drop, etc. The design of the algorithms will be mentioned in section 3.12.

The main methods exposed will be Connect(), Close() and Listen() which connects to a new peer, closes all connections, and opens a port for incoming connections respectively. FindNextPacket() is a private method which will decide which piece and peer the next request should be for and from whom.

2 Ogg Vorbis, http://www.xiph.org/ogg/vorbis/

- 32 of 61 -


Andrew Brampton

3.11.6 PeerPacketsAs shown in figure 3.7 all

packets will logically be represented as objects which implement the PeerPacket interface. The objects will have simple constructors. One which makes the packet from raw received data. The other makes the packet from properties such as the client’s name in a HandshakePacket.

The PeerPacketFactory will be used to create the packets with the first constructor. The PeerConnection or any other class may call the static function read on the PeerPacketFactory class with raw received data as a parameter, it will then return an instance of a PeerPacket. This then hides the decoding process and allows the main application to deal with PeerPackets instead of all the individual types.

3.12 Algorithms

3.12.1 Piece Picking Quality of ServiceWhen the client is running it must pick which pieces of the stream it will

download from which peers. There are a few different methodologies to do this to provide best results and a certain level of QoS (Quality of Service). If the peer picks a peer which is too slow the time requirements of the media will expire and the quality of the stream will decrease. If however a peer gets all its pieces quickly at the cost of another peer then there is an unfair QoS in the network. If the source peer also gets overloaded with request the original stream won’t have even made it to the network on time. Possibly algorithms for pieces selection follow:

Random order - The pieces are picked from a random peer which doesn’t have any piece waiting to be downloaded. This has the properties of randomly distributing the load throughout the network equally, however low bandwidth peers will surfer if the average bandwidth required is higher then their own.

Uniformly cycling - Sequentially cycle through all connected peers downloading from the peers in a logical order. This will have similar properties to the random order.

RTT Values - A round trip time for each peer will be taken in terms of seconds with the quickest peers being picked in preference to the slower peers. This is not guaranteed to provide the best peer because RTT is not always an accurate estimate of bandwidth as discussed in “Improving Round-Trip Time Estimates in Reliable Transport Protocols” by Karn et al [30].

Self Scoring - An internal count can be kept which numbers how many pieces have arrived in the last few minutes. The peers would then be sorted by this count allowing faster peers to be used more often, and if a peer becomes slow the count will decrease as it expirers and not used as often.

- 33 of 61 -


+PeerPacket(in rawdata : char)+getContents()+getType()+getLen()

«interface»PeerPacket

PeerHeaderPacket

PeerAnnouncePacket

PeerHandshakePacket

PeerDataPacket

PeerRequestPacket

+read(in data : char) : PeerPacket

PeerPacketFactory

Used to turn recieveddata into objects

Figure 3.7 UML of different PeerPackets

Andrew Brampton

Also to ensure timely delivery when pieces are requested if there is no reply in a specific length of time the request will expire and another peer will be asked for that piece.

3.12.2 Source Saturation ProblemA design problem that may occur is a saturation of requests sent to the source.

If all peers are buffered ahead as much as the source-peer then as soon as the source announces a new piece to the network they will all request that piece from the source and therefore congest the source. To solve this problem the source could only announce to a random set of peers which then would later announce the new piece and spread it throughout the network.

3.12.3 Pre-emptive SendingIn future extensions it may benefit peers to pre-emptively send connected

peers newly acquired pieces instead of waiting for a request. This may be used by a source peer who doesn’t announce any new pieces but instead sends them intelligently. This will help solve the problem in section 3.12.2 and maybe lower the packet overhead and delays.

3.13 Code Testing StrategiesOne of the reasons the project will be designed in Object Orientated language

is that strong unit testing strategies could be adopted. Each class should have a well defined job with clearly defined input and output parameters. It should be easy to define a set of test cases to test the functionality of each class.

While coding takes place the classes should be tested independently of each other in special smaller test projects. A simple testing framework should be developed for the issuing of simple tests and checking the output. This can be as simple as demonstrated here in this trivial example:

//Check if function computes correct sumint result = sum(10, 20);if (result != 30)throw error;

//Check if function...

A script in the above form could be generated that could be run against the class each time it is changed.

If a more complex solution is required testing frameworks such as CppUnit3 or C++ Test4 which will give benefits of automated testing and allow greater sets of tests to be carried out.

For the use of this project the simpler test scripts will be used due to the size of the project, smaller setup costs and ease of use. If in the future the project was to grow to include far more, a more advanced testing strategy would be adopted.

3.14 System Evaluation StrategiesOnce the program has been completed an evaluation of its usefulness must be

carried out. The evaluation should help decide if the requirements set out in 3.1 have

3 CppUnit, http://cppunit.sourceforge.net/4 C++ Test, http://cpptest.sourceforge.net/

- 34 of 61 -


Andrew Brampton

been met and if not, why they haven’t. The main requirements to evaluate should be “Is it robust”, “Does it scale” and “Low control overheads”.

The testing will be carried out by connecting a set number of peers to the network and begin streaming a test file. Detailed logs will have to be made to help the analysis afterwards. The majority of the testing will occur over a LAN, however at least one test will be over the internet. The reason for this is due to the small availability of internet connected machines.

A suitable measurement for “Is it robust” is how long after a peer fails does it recover and how many peers are affected. This measurement can be recorded by setting up a set of peers to receive a stream and then randomly kill peers and see how long before the other peers catch up and how many needed to catch up.

The scaling attribute is one that doesn’t directly affect the streaming protocol but more so affects the peer location aspect. While streaming a peer is only connected to a finite number of other peers. The number of connected peers will affect the streaming protocol however in normal operation this number is low, and even if it does become high the predicted results are still very acceptable. The direct problem with scalability is with the tracker and with the source peer. The tracker is a single point of failing and faults with this concept have been highlighted in section 3.5 Tracker-less Network, so for the scope of this evaluation the tracker will be ignored. Future research can be carried out into the concept of a tracker-less network, but with current Pastry and Chord implementations showing hugely scaleable networks we can partially infer that the scheme would scale.

The source peer also could become a bottleneck if the system is to scale. It has already been discussed in section 3.12.2 Source Saturation Problem why this might be a problem and possible solutions. For the moment we will assume it could be a problem and apply some evaluation to the area. A suitable measure of this would be to record how many of the peers request the data from the source compared to neighbouring peers. If they request all the data from the source then the network has certainly not scaled, and the system is no better than standard single point streaming. If however a low percentage of data is requested from the source then the protocol is working well.

An easy measure of control overhead is comparing the amount of data received compared to the amount of overhead required. This will simply take the form of the size of the announce and request packets compared to the size of the data packets.

3.14.1 Predicted ResultsThis section hopes to make predictions about the testing, this will aid in the

analysis of any data collected, and help pin point possible problems before they occur. The predictions will be calculated by modelling the protocol as a set of equations using optimal conditions.

Sequence of events for acquiring a new piece. Where ConnectedPeers is the number of peers you are connected to, and PieceSize is the size of each piece. In this example ConnectedPeers=100 and PieceSize=10,000bytes1 IN Announce 5 header bytes + 5 bytes 10 bytes

- 35 of 61 -


Andrew Brampton

2 OUT Requests 5 header bytes + 8 bytes 13 bytes3 IN Data 5 header bytes + 4 bytes + PieceSize bytes 10009 bytes4 OUT Announce ConnectedPeers * (5 header bytes + 5 bytes) 100 bytes

Total Overhead 123 bytesTotal Data 10,000 bytes

Table 3.1 Sequence of events for acquiring a new piece

The first calculatedly prediction is “Low control overheads”. Since in advance time it’s possible to work out how much overhead is sent we can make a fairly reliable prediction. Table 3.1 shows the common pattern of actions needed to download a piece of the stream. It can be seen that if a peer is connected to 100 other peers that the overheads required would be 123 bytes, for every 10,000 bytes of data. This represents a percentage of 1.23%. Of course both these values will change depending on the number of connected peers and the size of each piece.

The source peer scaling issue is harder to predict because in the current implementation timing issues will affect this greatly. The range this metric can take is:

This equation shows a huge range of values. In ideal conditions a network of 10,000 peers would only use the source for 0.01% of the traffic; however in the worst case scenario it would use it for 100% (which is as bad as single point streaming). It should be noted that these two ranges represent the source sending out between 1 and n copies of the stream, where 1 would be preferable and the absolute minimum and n being bad and the absolute maximum.

Robustness would be the hardest property to accurately record due again to timing issues and the number of factors involved. In a simple example, say there is a network of 100 peers, with each peer requesting one packet from 10 others. If one peer dropped offline without warning, on average up to 10 other peers would be affected. Each of these 10 peers would wait a reasonable amount of time before giving up and trying another peer. If this wait time is 5 seconds, each peer would be delayed 5 seconds. If however the peers are all pre-caching 30seconds of data then they can withstand 6 failures in a row before the stream playback is affected. For that number of failures to occur 6/10 of the network would have to fail at the same time. The chance of this is low in a diverse and widely spread network of 100 peers and seems very unlikely.

3.15 SummaryThis chapter has discussed all aspects of a proposed system. It began by laying

out the requirements needed of such a system and move on to discussing the type of network which would fill these requirements. The more detail concepts of peers and trackers were designed, with protocol defined. The chapter ends with code and system testing strategies which will aid the testing of the system in Chapter 6.

The next chapter will now discuss how the implementation of the system differed from the design and if any problems occurred that forced design changes.

- 36 of 61 -


Andrew Brampton

4 ImplementationThe implementation turned out being very similar to the designed solution

with only a few alterations and additions. This chapter hopes to follow on from the design and explain what changed and what problems were encountered.

4.1 ChangesThis section will highlight, in detail the subsections of the program which

were changed, and how they fit into the overall design.

4.1.1 Tracker

The program design of the tracker and the code which connected the peer to the tracker was completely left out of the design section. Error: Reference source notfound shows how the PeerClient uses a TrackerProxy object which abstracts the sending of a HTTP request to the tracker and waiting of the reply. The tracker side of the diagram will wait for HTTP connections and deal with them internally.

Error: Reference source not found is a UML sequence diagram that more accurately displays how the inner class communications work. The tracker on its own is just a simple web server with each incoming request being logically turned into a CHTTPRequest object. This then gets passed to whatever web application the web server is running, in this case passed to a TrackerMain object. The TrackerMain object then parses the request and produces a CHTTPReply which the web server then serializes back into a normal HTTP reply.

- 37 of 61 -


TrackerPeerClient

+main()

PeerClient

+Join()+List()+Part()+TimeExpired()

TrackerProxy

Web::URL

Web::Request

1

1 TrackerMain

+getHeader()+getURL()

CHTTPRequest

+setReplyCode()+addHeader()+setBody()

CHTTPReply

-IP-Port

Peer

1*

HTTP Connection

Tracker

1

1

1

1

Figure 4.1 UML Diagram of tracker design

Andrew Brampton

4.1.2 PeerManagerThe PeerManager class became the core of the program as the development

process took place. It was responsible for deciding which peers to connect to and which pieces to download from which peer.

- 38 of 61 -

Tracker CHTTPReplyCHTTPRequestPeerClient TrackerMain

HTTP Message

Create New

Handle Request (CHTTPRequest)

Create New

addBody()

addHeaders()

getURL

Returns CHTTPReply

Returns Reply

Figure 4.2 UML Sequence diagram on how the tracker works internally

PeerClient PeerManager PeerConnection PeerEventListenThread KickstartThread

CreateThread

Listen(port)

Connect(host,port)

CreateThread

Create(host, port)

Open()

FindNextPacket()

SendPacket(Request)

Send

Recv

Create

ThrowEvent(Data Received)

ThrowEvent(PieceReceived)Save Data

SendPacket(Announce)

Destory

Send

FindNextPacket

PeerManager

Figure 4.3 UML Sequence diagram of PeerManager connecting to a peer, then requesting a piece and finally announcing its completition


Andrew Brampton

It would also decide when and to whom it would send packets, for example Announcements or Data packet. It also handled all the notification events sent by objects such as the PeerConnection. A typical sequence of events is expressed in figure 4.3.

The sequence diagram shows the PeerManager being set to listen for incoming connections. This operation causes the PeerManager to spawn an internal thread which loops waiting for a connection. At an un-determined time later the PeerClient or the ListenThread will send a connect function call to the PeerManager which tells the PeerManager about a new Peer connection. The peer manager then firstly checks if the KickStart thread is running and if not spawns it. Following this it will create a PeerConnection object which connects to the remote peer.

The KickStart thread will fire off every 5 seconds to check if any connections are free for download. It carries this out by calling the FindNextPacket method on the PeerManager, which will, in turn find the first missing piece and then request it from the newly formed PeerConnection (we are assuming that this new PeerConnection has the piece). The PeerConnection internally handles the request and download of the piece and stores it in the correct StreamBuffer.

At this point a PeerEvent object is created and sent to the PeerManager to signal the download of some data. In turn the PeerManager sends its own event to the PeerClient saying that a piece was downloaded. Finally the PeerManager calls SendPacket(Announce) on all PeerConnections. This whole process is then repeated with the next piece and/or new peers.

4.1.3 Vorbis Ogg Playback Library

The first implementations streamed Vorbis Ogg audio. Due to the media agnostic nature of the protocol and good design patterns the only section of the application which needed changing to allow different media types was the Playback class. A new inherited class called OggPlayback was designed. It exposes the same methods as any other playback device but internally decodes the steam into a RAW WAV format and directs this at the default sound card. This is depicted in Error:Reference source not found.

The decoding of the audio was done by the library libvorbis, developed, in part, by the Ogg Vorbis Codec project5. This library was wrapped by the OggPlay class which, in turn was wrapped by the OggPlayback class. In addition to the 5 Ogg Vorbis Codec Project, http://www.xiph.org/ogg/vorbis/

- 39 of 61 -

Figure 4.4 UML Class Diagram of OggPlayback

+PlaybackInterface(in DataSource : StreamBufferInterface)+isPlaying()+Play()+Stop()

«interface»PlaybackInterface

-playBackThread()

OggPlaybackFileWriterPlayback VideoPlayback-vorbis_close_func()-vorbis_read_func()

OggPlayStream

OggPlay«interface»StreamBufferInterface

11

11

libvorbis

+open(in rate : int, in channels : int)+writeAudio(in data : char, in size : int)

WaveOut 1

RAW Audio Travels Between These Objects


Andrew Brampton

OggPlay class, an instance of a WaveOut class was inside the OggPlayback class. This WaveOut class was created to wrap around the Win32 WaveOut* API which is used to make the computer to play the audio.

4.1.4 Bitmap ClassThe bitmap class was one of

fundamental importance which wasn’t considered in the design and had to be designed and created during implementation. It would provide a logical abstraction to an array of bits that represented different indexes in the stream. Such an array was used in Announce packets described in section 3.10.5. The class became a very important one which was at the centre of most objects, including the StreamBuffers.

Internally it would use a malloced char array of n/8 + 1 bytes where n was the number of indexes the bitmap represented. The class would also keep an integer which represented the logical start of the array. This was so the array could start at any arbitrary high index with the char array size still only being small.

Error: Reference source not found shows the class definition of a bitmap highlighting each function. The operations the functions carried out were comparatively simple but considerations had to be taken to make sure the class would be thread-safe. During the testing stage the class had to be altered many times to ensure deadlock didn’t occur and to ensure no two threads were able to alter the contents concurrently.

The class would be created with a single parameter in the constructor which set the size of the map in bits. The default would be for the map to start at index 0. The setter and getter methods {set,get}Start allowed calling classes to move the bitmap’s starting index, and to get the current starting index. When the starting index was moved along it would update the internally stored byte array correctly by setting and un-setting bits appropriately. If the new start was not within the range of the old array all the bits would accordingly be set to zero. The public get and set methods allow the calling code to get or set the value of a specific index in the bitmap. There are two private get and set methods with an optional last parameter called gotSemaLock which allows the internal code to skip any concurrency considerations to avoid deadlock. Finally the bitmap has only 1 other set of functions of interest, the {or,and,not}Bitmap functions. These allow a bitwise operator to be applied to all values which are in the same range of both bitmaps.

4.2 Problems EncounteredDuring implementation of the design numerous problems occurred that caused

design changes to be made, and many “hacks” to be introduced. The problems of noteworthy interest are explained in this section.

- 40 of 61 -


+get(in index : int) : bool-get(in index : int, in gotSemaLock : bool) : bool+set(in index : int, in value : bool)-set(in index : int, in value : bool, in gotSemaLock : bool)+getStart() : int+setStart(in newStart : int)+andBitmap(in bitmap : bitmap)+orBitmap(in bitmap : bitmap)+notBitmap(in bitmap : bitmap)+bitmap(in size : int)

-mapArray : char-logicalSize : int-physicalSize : int-hSema : Semaphore

bitmap

Figure 4.5 UML Class Diagram of bitmap

Andrew Brampton

4.2.1 StreamBuffer changing without notificationThe StreamBuffer was a key part of abstraction which allowed either a

memory based buffer, or a file based source to be represented as the same object. The abstraction worked perfectly until the concept of “Start” was introduced. For example the memory StreamBuffer would have a start in the middle of the buffer, having half the buffer store old used pieces, and half storing new soon to be used pieces. The problem comes when you decide where the middle (or start) of this stream is, and when it should be incremented. It was decided that the middle would be determined by the playback device. For example, the OggPlayback object would be responsible for moving the middle along and causing of data to expire, and allowing new data to be downloaded.

In-directly this created the problem that the PeerManager wouldn’t be notified when there was more space in the buffer, (or in the case of the FileStreamBuffer) when new pieces had been added. A hack to avoid this problem was to have a thread inside the PeerManager which would continuous poll the StreamBuffer checking for changes, and if one was detected the correct action should be taken (i.e. Request a new piece, or Announce a new piece being available).

4.2.2 Concurrency IssuesOne thing the UML didn’t show was live times of objects and how different

objects may interact with the same object concurrently. It was always assumed that many threads would try accessing the same objects so a lot of concurrency controls were put in place, however there was some which weren’t obvious enough and caused a few problems. One such case was the peer list object kept inside the PeerManager. It would be continuously accessed from many threads inside of PeerManager and all worked fine until 16 peers were online, at which time the number of access started to collide and the list would corrupt.

4.2.3 Self Connecting Peer & Peers Connecting Both WaysA design flaw which was also a feature made it impossible to tell if you have

already connected to a specific peer. The reason being was peers should not identify by their IP and Port address alone, instead they should be identified by their PeerID. However for security reasons the peer ID was kept secret, so only the tracker would know the peer’s ID. This has now left the problem where a peer may connect out to another peer, later that peer then connect in, causing two way connections. Even more alarming is that a peer may connect in to itself.

Both of these situations cause unneeded and additional data to be transferred. To solve the self connecting problems a hack was made in the tracker which stopped the requesting peers details be displayed in the list returned. The first problem of two way connections couldn’t as easily be solved without a redesign. A suggested solution would be to change the peer handshake protocol to send the peer’s ID. This would then allow you to not connect to a peer twice incorrectly. However the peer IDs would not be secret anymore so the tracker would have to be changed to only accept commands for a specific peer ID from only the IP which first registered it. This is still subject to exploit, by IP Spoofing etc, however, that seems an acceptable compromise.

4.3 Algorithms UsedThe majority of the program uses very simple algorithms. This is due to the

good Object Orientated practice and the good design upfront. These two factors

- 41 of 61 -


Andrew Brampton

allowed the program to be relatively easy to code and any problems were easily fixed. This section hopes to show the most complicated algorithms used in the program.

4.3.1 FindNextPieceThe FindNextPiece method was responsible for looking at all the

PeerConnection objects and deciding which piece was next to be downloaded. This algorithm is displayed in Error: Reference source not found. The method consists of two main loops; the outer loop incrementing the piece being looked for and the inner loop cycling through all peers. The outer loop will loop until a suitable piece is found. Once one is found the second loop works by checking if first the peer has the piece and then if the peer is free to send the piece. If both conditions are true the method may return with the knowledge of a piece and peer. If this fails the next peer in the list is considered and this continues until there are no more peers. In that case the inner loop breaks and the outer loop is able to increment to the next piece. The inner loop process thus starts again now. This sequence of events continues until the outer loop breaks.

4.3.2 PeerConnection ThreadThis thread is responsible for sending and receiving data from the TCP socket.

The following pseudo code shows how it works:

While (connected) {Loop through send queue {

Send(packet in queue)If TCP errors occurred

exit}

- 42 of 61 -

FindNextPiece(X = 0)

Look For Peer With Piece X

Get Next Peer NO

YES

Look For Next Piece

(X = X + 1)

NO

YES

YES

YESFailed To Find

Piece XMore Peers

Exist?Has Peer Got

Piece X?

Are we currently

downloading this piece?

Has We Allready Got

Piece X?

NO

Is Peer Free? YES NO

Found Next Piece And Peer

YES

NO NO

Do We Want Piece X?

Failed To Find Any Piece

NO

Method starts here

Method returns here after a failure

Method has succesfully found a piece and can now

request it from the peer

Figure 4.6 Flowchart of FindNextPiece


Andrew Brampton

// On the first pass we always want 4 bytes // (which is the length of the packet)Size = 4FirstPass = true

// This loops until the size of data has been readWhile (size > 0) {

Recv(size bytes of data)If TCP errors occurred

exit

If FirstPass {Size = dataLen = SizeFirstPass = false

} else {Size = Size – amount of data read

}}

// Now generate a packetpacket = PacketFactory(data)

// This is where the application logic goes// to decide what to doDecideWhatToDo(packet)

}

As you can see this code will continue to loop while the Boolean connected is true. Inside the loop the first thing which happens is all queued messages are sent. This should be clear from the first few lines. The remaining bulk of code is to send the data. The reason for its complexity and length is because we are receiving the data in blocks instead of a stream. This, of course is a limitation of the Winsock API which should have been abstracted out with a class able to expose a stream of data.

The code works by first reading 4 bytes of data, which due to the way the packets are formatted would represent the length of the remaining packet. The code will then loop reading this remaining number of bytes. The code has to loop instead of trying to request all the bytes directly because the incoming data may have been fragmented across many IP Packet and as such recv will return a fraction of the total each time. Once size reaches zero all the data has been read and the code drops down to the PacketFactory. This factory class takes raw packet data and creates an object which logically represents the data. This object is then dealt with and the program continues to loop for more data.

4.4 SummaryThis chapter has successfully identified the changes made to the design, and

discussed any problems encountered. Algorithms have been discussed in detail for the more important classes.

Next chapter will display the look and feel of the system. Each main program of the system will be discussed with screen dumps to explain normal operation.

- 43 of 61 -


Andrew Brampton

5 System in OperationThe previous chapter discussed the implementation of system, how it was

made and what issues were encountered. However the previous chapter didn’t discuss how the system looked and operated to the user. This chapter will show how the programs would appear and how to use them.

The actual programs themselves are all very simple. Since this project is more focused on the research and design of a new protocol, features such as GUIs were not important and thus not included in the programs. The programs were designed to be as simple and informational as possible. For these reasons all programs are simple console based ones which log very detailed information on what is happening internally. This console log is also outputted to a file for later inspection.

While the programs are running they require no user input. The only input they will accept is a ctrl+c to close the program. To control the programs, command line arguments may be specified.

5.1 TrackerThe first component of the system is the tracker. The tracker is always needed

in the network to provide a peer location service. The tracker is started with the following arguments:

tracker.exe {ogg filename}

You must specify an Ogg filename to the tracker. The only reason you need to do this is so the tracker can read some meta-data about the stream. Once this has been carried out the tracker starts running and waits for incoming connections.

[15:05:17] Program starts[15:05:17] Now serving: d:\\Jeff Wayne - War of the Worlds - Disc 1 - 128 kbps.ogg[15:05:17] Created server socket on port 8000[15:05:22] Accepted new connection [10.36.152.128][15:05:22] 10.36.152.128 http://10.36.152.128:8000/?action=join&peer-id=EGA010000001opinctfh&peer-ip=10.36.152.128&peer-port=4567[15:05:22] Closed connection [10.36.152.128][15:05:23] Accepted new connection [10.36.152.128][15:05:23] 10.36.152.128 http://10.36.152.128:8000/?action=list&peer-id=EGA010000001opinctfh[15:05:23] Closed connection [10.36.152.128][15:05:56] Accepted new connection [10.36.152.130][15:05:56] 10.36.152.130 http://10.36.152.128:8000/?action=join&peer-id=EGA010000003ccaapzal&peer-port=4567[15:05:56] Closed connection [10.36.152.130]

Figure 5.1 Log generated by a tracker

Figure 5.1 shows the typical connection of two peers. The first peer connects from 10.36.152.128 with the ID EGA010000001opinctfh. You might notice that the first 11 characters of the peer ID don’t appear randomly generated. They are in fact the name of the computer the peer is running on. For debugging and testing reason the peers would generate semi-random PeerIDs with their computer name prefixed.

- 44 of 61 -


Andrew Brampton

Once the peer has connected it issues a join with its port and IP. Next, the peer re-connects and issues the list command to obtain a peer list. In this case the list will be empty. The second peer now joins from 10.36.152.130 and again issues the join and then list commands.

5.2 PeerSourceThe second most important component is the PeerSource. The usage of this

program is just as simple as the tracker. The program will accept parameters like so:

PeerSource.exe {tracker url} {source file} [{ip}]

The tracker URL and the source file are both required, the IP parameter is optional. An example use would be:

PeerSource.exe http://tracker.com/ mymusic.ogg 192.168.0.1

This would connect to the tracker at tracker.com and begin sharing the stream pulled from the file mymusic.ogg. On the join request the peer will also indicate it is coming from IP address 192.168.0.1. It has to indicate its IP address for the situations when the tracker and PeerSource are ran on the same host. In this case, the tracker will see the source connecting from 127.0.0.1 and record that IP address. Later, if the tracker sends a peer list the IP 127.0.0.1 would be listed, which would in turn cause problems for the remote peers.

Once the source has started, it will output textual information similar to the tracker. This is shown in Figure 5.2. In the figure’s example it shows the SourcePeer first connecting to the tracker at 10.36.152.128:8000. It returns a list of no peers, presumably because the Source was the first to connect. 30 seconds later a remote peer connects in from 10.36.152.130:2296. The peers begin by exchanging handshakes and announcing which pieces of the stream each has. The remote peer then sends a request and the source peer replies with the data. Once the data was transmitted, the remote peer announces the piece has been completed and continues to request more.

Eventually the program is ended at 15:07:20 when a ctrl+c is pressed at the console. This ctrl+c tells the program to cut all connections and to try and clean up. A few seconds later the program has dealt with any clean up and quit.

[15:05:22] ***************** PROGRAM START *****************[15:05:22] Sending Tracker Join (10.36.152.128:8000)[15:05:22] Getting Tracker List (10.36.152.128:8000)[15:05:23] List: returned [15:05:57] [10.36.152.130:2296] 8 PeerConnected[15:05:57] [10.36.152.130:2296] OUT 19 Handshake Version:1,0 Andrew's Client[15:05:57] [10.36.152.130:2296] OUT 7 Announcement Start:0-24 111111111111111111111111[15:05:57] [10.36.152.130:2296] IN 19 Handshake Version:1,0 Andrew's Client[15:05:57] [10.36.152.130:2296] IN 14 Announcement Start:0-24 000000000000000000000000[15:05:57] [10.36.152.130:2296] IN 8 Request Start:0 End:1[15:05:58] [10.36.152.130:2296] OUT 10004 Data Start:0

- 45 of 61 -


Andrew Brampton

[15:05:58] [10.36.152.130:2296] IN 14 Announcement Start:0-24 100000000000000000000000[15:05:58] [10.36.152.130:2296] IN 8 Request Start:1 End:2[15:05:58] [10.36.152.130:2296] OUT 10004 Data Start:1[15:05:58] [10.36.152.130:2296] IN 14 Announcement Start:0-24 110000000000000000000000[15:07:20] Quiting.... (ctrl+c)[15:07:21] [10.36.152.130:2296] 9 PeerDisconnected[15:07:21] [10.36.152.130:2296] SocketError Socket Error 10053[15:07:21] ****************** PROGRAM END ******************

Figure 5.2 Log generated by a PeerSource (altered to improve readability)

5.3 PeerClientThis is the last program in the system which most of the hosts in the network

would be using. It works almost identically to the PeerClient due to the fact it uses the same code base. It is started by using the following parameter:

PeerClient {tracker url}

This time only a tracker URL is needed. Once connected to the tracker the tracker will explain all the other details such as media type etc.

[15:05:55] ***************** PROGRAM START *****************[15:05:55] Sending Tracker Join (10.36.152.128:8000)[15:05:56] Getting Tracker List (10.36.152.128:8000)[15:05:57] [10.36.152.128:4567] 8 PeerConnected[15:05:57] [10.36.152.128:4567] OUT 19 Handshake Version:1,0 Andrew's Client[15:05:57] [10.36.152.128:4567] OUT 14 Announcement Start:0-24 000000000000000000000000[15:05:57] [10.36.152.128:4567] IN 19 Handshake Version:1,0 Andrew's Client[15:05:57] Ogg: Now playing[15:05:57] Ogg: TITLE=War of the Worlds[15:05:57] Ogg: ARTIST=Jeff Wayne[15:05:57] Ogg: Bitstream is 2 channel, 44100Hz @ 128kpbs[15:05:57] [10.36.152.128:4567] IN 7 Announcement Start:0-24 111111111111111111111111[15:05:57] [10.36.152.128:4567] OUT 8 Request Start:0 End:1[15:05:58] [10.36.152.128:4567] IN 10004 Data Start:0[15:05:58] [10.36.152.128:4567] PartComplete 10004 Data Start:0[15:05:58] [10.36.152.128:4567] OUT 14 Announcement Start:0-24 100000000000000000000000[15:05:58] [10.36.152.128:4567] OUT 8 Request Start:1 End:2[15:05:58] [10.36.152.128:4567] PartComplete 10004 Data Start:1[15:05:58] Ogg: Read 8500 bytes 0 missing[15:05:58] Ogg: Read 8500 bytes 0 missing[15:07:25] [10.36.152.128:4567] SocketError Socket Error 10054[15:07:25] [10.36.152.128:4567] 9 PeerDisconnected[15:07:25] Quiting.... (ctrl+c)[15:07:26] ****************** PROGRAM END ******************

Figure 5.3 Log generated by a PeerClient (altered to improve readability)

- 46 of 61 -


Andrew Brampton

Figure 5.3 shows the output generated from the PeerClient’s side when connecting to the PeerSource listed in Figure 5.2. It connecting to a tracker, then a peer, then quits after the remote peer disconnects.

5.4 SummaryThe aim of this chapter was to demonstrate how the system looked and

operated to the user. This aim has been successfully carried out and the very basic user interfaces have been explained. This chapter however didn’t talk about the correctness of the system or how it actually operated internally. The next chapter will test and evaluate the system to prove its correctness and to record statistics for its usefulness.

- 47 of 61 -


Andrew Brampton

6 TestingThis chapter will focus on testing the correctness and usefulness of the system.

Sections 6.1 and 6.2 focus on testing how the systems behaves and if it follows the specifications laid out in the design chapter. 6.1 will begin by black box testing some classes and then progress into 6.2 which will combine one of more of these classes to continue performing bottom up testing.

Not all test results will be shown in this section. Any major issues found in the program will be highlighted, otherwise all other classes worked as expected or failed first time being fixing with a trivial case thus not noteworthy.

Section 6.3 hopes to test how well the system performs in a lab and real environments. This section will then form the base of data that will aid the analysis in the next Chapter Evaluation.

6.1 Unit TestingTables of tests were carried out on some of the class used by the peers. These

tables were devised by inspecting each method exposed on each class and systematically calling each of these methods with test data. The test data was chosen to be in three categories, Typical, Extreme and Erroneous.

The majority of tests were carried out at development time therefore not everything is documented. However to help provide completeness some tests have been documented.

6.1.1 Bitmap ClassAs discussed in section 4.1.4 the Bitmap class became one of the most

important classes, as such repeated tests were carried out. The table of test cases is displayed in section 10.1 of the Appendix. Here we will discuss the failures and why something went wrong.

The first set of failures which has since been fixed was the setStart() method. This method moved the logical beginning of the array up or down, and as such had to move the contents as well. The problems was two fold; Firstly, setStart() would act very randomly and sometimes only move segments of the array instead of the whole array. Secondly setStart() would only move on the byte level (i.e. in chunks of 8 bits), however requests made to setStart() didn’t always start on a multiply of 8 and as such the beginning index and the array would get out of sync.

The setStart() problem was fixed by rethinking the logic of the class and re-coding most of it with additional internal variables such as logicalStart, physicalStart, logicalSize, physicalSize. These all help keep track of the internal states instead of the states being calculated on the fly from just a few variables.

The next series were of failures labelled F1 and F2 were caused by the lack of checking inside the constructor. It should be noted that these tests still failed since the class hasn’t been updated. It was considered acceptable for these tests to fail since they wouldn’t interfere with normal operations.

The final failure is F3. Due to the setStart() reasons listed above, a large amount of time was put into making it function correctly. Even still setStart() doesn’t function when the new start is smaller than the current since this added more complexity and is a feature which wouldn’t be used. The code could be added in the future to allow this but at the moment an exception is thrown indicating the code isn’t finished.

- 48 of 61 -


Andrew Brampton

6.1.2 StreamBuffer ClassThe StreamBuffer class was the main data source for the PeerClient. It was

very important because data was being written to and read from all the time. If this class didn’t behaviour correctly the stream would corrupt.

The test cases in section 10.2 show that the class operated mostly correctly. The only faults with the class occurred when it was constructed with incorrect parameters. The numbers passed were erroneous and the constructor didn’t provide valid checking for these cases. The last test to fail was when an extreme parameter was passed to the constructor. This parameter made the class malloc many gigabytes of RAM. After mallocing 2GB successfully the program (including the development environment, and some windows applications) crashed with fatal Out Of Memory errors. This is to be expected since the memory limit of the OS was reached. Maybe a hard limit should have been sent on the StreamBuffer however this was never considered in design.

6.2 Integration TestingTesting of high level components require an exponential increase in test cases

and time. For this reason the testing in this section will only include one class. The other high level classes have been tested to function correctly however individual test cases were not carried out.

6.2.1 PeerConnection ClassThis is the class responsible for logically representing the connections to and

from the remote Peers. It is also responsible for sending and receiving packets. Any received packets are also de-serialised into PeerPacket objects. Due to the difficulty in making test cases for incoming data from a remote source these tests will be excluded from the official test cases. Instead, these tests were carried out during the implementation of the class.

Error: Reference source not found shows the components which make up the PeerConnection class. Each individual component has been tested separately and is reported as working correctly. When the PeerConnection is used it may expose flaws in the sub-components which weren’t previous tested with that kind of data.

The test results shown in section 10.3 indicate that very few problems were found. In fact, during implementation, PeerConnection did have some problems mostly relating to concurrency issues as discussed in section 4.2.2.

- 49 of 61 -

+Open()+Close()+GetPieceMap()+RequestPiece()-SendPacket()

PeerConnection

+StreamBuffer()+Read()+Write()+Peek()

«interface»StreamBufferInterface

+Read(in Data : char) : PeerPacket

«utility»PeerPacketFactory «interface»

PeerPacket

+get()+set()

bitmap

1

Figure 6.1 UML Class Diagram of a PeerConnection


Andrew Brampton

6.3 Performance TestingTesting of the system took place in a control environment with a number of

hosts. At most 16 hosts took part over an unloaded 10/100mbit network. While testing took place all hosts logged data sent and received. Eleven different tests took place over the course of two days. The different tests and changes in each one is listed in Table 6.1

Test Hosts Notes1 3 (2 Peers, 1 Source)2 6 (5 Peers, 1 Source)3 6 (5 Peers, 1 Source)4 6 (5 Peers, 1 Source) Delayed starting of source5 6 (5 Peers, 1 Source) 30 Seconds staggered start6 12 (11 Peers, 1 Source)7 13 (12 Peers, 1 Source)8 15 (14 Peers, 1 Source)

Following tests with changed algorithm9 16 (15 Peers, 1 Source)

10 16 (15 Peers, 1 Source)11 6 (5 Peers, 1 Source)

Table 6.1 List of tests carried out on the system

The tests were carried by streaming a 45 minute long Vorbis Ogg file of the audio book “War of the Worlds”. The stream was encoded at 160kbps. Not all tests were run for the full 45minutes however they did run long enough to get a good sample of results.

Tests 1-3 and 6-11 were run without any special settings. In each case the source was started first, and then the remaining peers were started within the next 30seconds. Tests 4 had the source peer start after the peers; this allowed all the peers to start exactly at the same time and to allow testing of the load placed on the source. Test 5 was tested over a 5 minute period with each peer starting 30seconds after the previous. This was an attempt to simulate new peers joining regularly during the playback of the stream.

The final 3 tests, 9-11 were carried out in the same way however the FindNextPiece algorithm was changed to allow a better rotation of the peer list. The reason for this change was that preliminary results weren’t promising enough so a change was made and re-tested. Specifically the algorithm described in section 3.12.1 was changed from a “uniformly cycling” to a “self scoring”.

Logs of all the tests can be found online at the working documents website. Tables of results analysed from the logs may also be found with the working documents online.

Test 1 2 3 4 5 6 7 8 9 10 11Hosts 3 6 6 6 6 12 13 15 16 16 6Efficiency

18% 49% 21% 3% 40% 56% 22% 61% 33% 61% 25%

Overhea 0.39 0.78 1.20 1.08 0.65 2.35 2.11 2.59 0.79 1.76 0.87

- 50 of 61 -


Andrew Brampton

ds % % % % % % % % % % %Table 6.2 Summarised results from 11 test cases

Table 6.2 shows the results of interest. The row labelled Efficiency is a percentage of how much effective the streaming was compared to a single source stream. It is calculated by summing the amount of data sent by each peer, and then comparing this to the amount of data sent by the source peer. This is shown in this equation:

For example, if there were two peers (one source and one receiving peer), this number would be 0% because the source peer sent 1 whole copy, the receiving peer sent nothing, thus evaluating to:

This efficiency will obviously never reach 100%, and the maximum will vary depending on the number of hosts within the network. For a more detailed explanation see paragraph 3 of section 3.14.1.

The last row labelled Overheads is the calculation shown in paragraph 2 of section 3.14.1. It represents the amount of control traffic divided by the amount of data received.

6.4 SummaryThis chapter has tested the correctness and the operations of the system. Not

many tests were carried out, however from the results gained in section 6.3 it can easily be seen that the system functions correctly. Section 6.1 showed a few components of the system being tested, then in section 6.2 these components were linked together to provide results for an integration test. The chapter finishes with tests before on the system across many machines.

The next chapter will now take the results obtained in section 6.3 and analysis and evaluate them. This will help form the basis of the conclusion.

- 51 of 61 -


Andrew Brampton

7 EvaluationThis section hopes to analysis the data collected in the previous chapter. It will

use the approaches discussed in section 3.14 to evaluate the system. This includes statistical and graphical methods to calculate how well the system performed depending on the requirements set in section 3.1. These results will also be compared to the predicted results generated in section 3.14.1.

7.1 EfficiencyAs explained in section 6.3 and 3.14.1 the efficiency is measured by how more

effective the system is, compared to single point streaming. This is the metric that will be used to evaluate requirement 1.4 (move the stream distribution load away from the source”). Table 6.2 shows the results to be in the range 3% to 61% with an average of 35% and standard deviations of 19%. This instantly indicates that the protocol is already 3 times better than existing deployed streaming technologies. The standard deviation however suggests that the protocol is inconsistent and results can vary greatly. When the algorithm changes were made to the system, it improved the average by 6%, and maintained a similar standard deviation. However due to the large variance observed and the low number of test it can not be inferred that the algorithmic change made much, if any improvement.

To further evaluate the results Figure 7.1 displays a graph of efficiency versus peer count. The graph shows two sets of points and a shaded area. The two data sets are the tests before and after the algorithm change. The shaded area indicates the maximum possible efficiency achievable for that number of peers. In theory no point should be outside the shaded area, however, the closer the point is to the top the better. No line of best fit was drawn on the graph due to the well spread results.

- 52 of 61 -


Andrew Brampton

Percentage of the stream forwarded by non-source peers

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Peers

Efficien

cy %

Theoretical Best

Tests 1

Tests 2 (Rotate Algorithm)

Figure 7.1 Graph of Percentage of the stream forwarded by non-source peers

- 53 of 61 -


Andrew Brampton

The graph shows the majority of points are above 20% however there is one stray point at 3%. The 3% result was generated from test 4. This was the test with the delayed starting of the source peer. It can be debated that this result was due to the Source Saturation problem discussed in section 3.12.2, where all peers see the source announce at the same time and therefore all request at the same time. It is obvious that this would be a problem that requires fixing in the future. Section 3.12.3 may provide one solution.

From the graph there may also be a slight upwards trend of efficiency with more peers. This might be caused by timing issues between the peers. If there are more peers online, the drain on the source would be greater and as such the delay between the announcements to all the peers would be slightly lagged. This would cause announcements sent by normal peers to arrive before the source peer’s thus allowing some peers to request the stream from a neighbour instead of the source.

The 5th test which allowed each peer to be started on 30second increments returned a result of 40%. This number is roughly the same as the average and shows that peers joining at different times throughout the stream don’t impact performance.

Overall the efficiency appears to be good, however further testing should go into changing the algorithms used. Even with the current un-tuned implementation, tests have shown the system reduces the bandwidth used on the source by between 1 and 3 times. Future version could increase this value by much more.

7.2 OverheadsThe next measured criterion was the amount of overheads sent to provide the

stream. This will help analyse the system’s ability to fulfil requirements 1.3 (stream data with low control overhead) and 1.5 (be scalable). In section 3.14.1 these figures were estimated and are displayed in Table 7.1 with the observed results:

Test 1 2 3 4 5 6 7 8 9 10 11Hosts 3 6 6 6 6 12 13 15 16 16 6

Overheads 0.39% 0.78% 1.20% 1.08% 0.65% 2.35% 2.11% 2.59% 0.79% 1.76% 0.87%

Estimated Overhead

s 0.30% 0.60% 0.60% 0.60% 0.60% 1.20% 1.30% 1.50% 1.60% 1.60% 0.60%Difference 0.09% 0.18% 0.60% 0.48% 0.05% 1.15% 0.81% 1.09% -0.81% 0.16% 0.27%

Table 7.1 Predicted overheads compared to observed overheads

It can quickly be seen that the predicted results are lower than the real observed results. This means that either the predictions were wrong, or the system functioned worse than expected. In section 4.2.3 it was shown that there were problems with the peers connecting to themselves, and making connections in both directions between peers. If this is the case then the predicted results for 6 peers isn’t really 6 but instead should be 12 because logically each peer thought they were connected to 11 others. This might help explain why the overheads were higher than expected.

Figure 7.2 is a scatter plot of the results obtained. The x-axis shows number of connected peers, whereas the y-axis shows the percentage of traffic sent which was control traffic. There are two data sets drawn, one is the predicted results, the other is the test results. For the test results a line of best fit has also been added.

- 54 of 61 -


Andrew Brampton

Figure 7.2 Graph of protocol overheads depending on number of connected peers

From the graph you can again quickly see that the results used more overhead than predicted, but not by much. The gradient of the line is also steeper than the predicted. Even with this greater gradient the overheads are still very low. At 20 peers the overheads would be 2.6%, 50 peers 6%, and at 100 13%. These numbers are a bit high and should be lowered in future implementations.

It should be noted that these overheads are only for connected peers. There may be 1000s of peers on the network, but any peer may only be connected to a small number. These results however do indicate a maximum number of peers the client should keep connected. If too many are connected the client will begin to waste bandwidth, if too few the client won’t get the stream on time.

7.3 SummaryThis chapter has shown that the system has operated correctly, well and almost

to expectations. Tests results have been promising and future improvements can be looked into. The efficiency of the protocol seems good, and even better if a correct requesting algorithm is found. The overheads were a little high, but hopefully bug fixes to the program will solve this.

The next chapter will now bring together all the elements of the project and evaluate the entire project as a whole.

- 55 of 61 -


Andrew Brampton

8 ConclusionTo complete the report this chapter will recap the aims of this project and

discuss how well each aim has been achieved. In the first chapter the goals for the project were deliberate, here they will be broken down into five main objectives. After each objective will be a discussion of how well they were achieved.

8.1 Project Goals

“This project aims to investigate current P2P and streaming research topics and highlight any flaws in these systems.” [Section 1.3 sentence 1]

This goal has been successfully covered in the background reading chapter of the project. Different P2P architectures were researched such as Gnutella networks [see section 2.2.2], Pasty and Chord networks [see Sections 2.4.1 and 2.4.3], and more specific streaming P2P networks such as ZIGZAG [see Section 2.3.4]. Each one of these had their good and bad points discussed with some thought placed into the flaws and why they occurred.

“It will also integrate previously unrelated topics of P2P and streaming into a single solution.” [Section 1.3 sentence 2]

This goal has been both successfully and unsuccessfully achieved. At the time of writing it was unaware to the author that streaming and P2P had previously been combined. As such the project was unable to “integrate previously unrelated topic” because the background chapter had discovered proposals which had combined these concepts. Even with this newly discovered research, this project was able to combine P2P and streaming in a new way building on technologies such as BitTorrent [see Section 2.2.8].

“This solution will be developed by improving existing techniques whilst solving any flaws they may have” [Section 1.3 sentence 3]

The finished solution did improve on existing streaming techniques and from the results found from in Chapter 7 it can be seen that with un-optimised solution it was at least 3 times more efficient than current single source streaming. The solution however wasn’t 100% successful on removing all the flaws. Scalability and Reliability issues can be found with the tracker approach; however in section 3.5 it was discussed how this could be replaced whilst building on top of another P2P network.

“The developed solution must satisfy a list of requirements which will be derived and discussed in chapter 3.” [Section 1.3 sentence 4]

A list of requirements was clearly laid out in section 2.2 and the rest of the design and implementation section was built around these. Each time a requirement was meet or broken it was noted in the report. Overall all the requirements were meet however some to a greater degree than others.

- 56 of 61 -


Andrew Brampton

“Once a suitable solution has been found, it will be scrutinized under numerous tests to find out its usefulness and tested to demonstrate how much more efficient or

effective it is to current streaming solutions.” [Section 1.3 sentence 5]

The purpose of Chapters 6 and 7 was to conduct this single goal. It should be easily seen that this goal has been achieved, and allowed this project to easily report its strengths and weakness.

In summary all five goals have been achieved, some to a higher degree than others, but nevertheless all achieved. These five points have also aided in the final conclusion of the report by logically breaking the project into five main criteria.

8.2 Future WorkNo project is ever complete and this project is no exception. It has already

been highlighted in the testing section that there is future work to be carried out with the piece picking algorithms. This can be in the form of mathematical analysis or empirical evidence. Gaining future results via empirical methods would be the preferred method since it has already been seen that mathematical predictions failed to accurately work in this all cases.

Other future work could also include testing the protocol on a larger scale, i.e. with more than 100 hosts. It could be possible in larger scale networks that a more structured approach to the distribution should be undertaken because unforeseen problems might occur. One foreseeable problem is if smaller highly connected groups of peers were formed out of the whole network, and these smaller groups get starved of the stream, then a large number of peers would be affected. Problems such as these must be thought about and tackled.

A final future work would be to improving the underlying P2P network used. The tracker is an obvious bottle neck and this has been known since the design. Work should be placed into looking how a Distributed Hash Table such as Pastry and Chord can be used to remove the need of trackers.

8.3 SummaryIn the years to come streaming media is something which might become more

and more popular. Today we are already seeing mobile phones that can make video calls. In such low bandwidth environments as mobile networks, solutions such as this are needed.

It is clear from the goals that the project has achieved what it set out to do. A successful P2P Streaming protocol was designed, which works more efficiently than current solutions. This was achieved though extensive research into the area, followed by well design system architecture, concluded by successful tests. All in all, the project has achieved what it set out to do and maybe a little extra.

- 57 of 61 -


Andrew Brampton

9 References

- 58 of 61 -


Andrew Brampton

10 Appendix

10.1 Bitmap Test CasesTest Expected Result ResultConstruct, Deconstruct Cleanly deleted object As Expected Construct(0) Throw InvalidSize

ExceptionNothing Thrown

F1

Construct(1) Returns OK As Expected Construct(2^31 -1) Returns OK (but using

up >256MB of memory)As Expected

Construct(-1) Throw InvalidSize Exception

Nothing Thrown

F2

Following are done with a new Construct(32)getSize(); 5 bytes (32 / 8 + 1) As Expected getArray(); 0x 00 00 00 00 As Expected set(0, false); 0x 00 00 00 00 As Expected get(0); False As Expected set(0, true); 0x 80 00 00 00 As Expected set(0, true); get(0); True As Expected set(31, true); 0x 00 00 00 80 As Expected set(32, true); Throw OutOfBound

ExceptionAs Expected

set(0, true); setStart(0); 0x 80 00 00 00 Start:0 As Expected set(0, true); setStart(1); 0x 80 00 00 00 Start:1 As Expected set(0, true); setStart(8); 0x 00 00 00 00 Start:8 As Expected set(8, true); setStart(8); 0x 80 00 00 00 Start:8 As Expected set(0, true); setStart(31);

0x 00 00 00 00 Start:31 As Expected

set(8, true); setStart(32);

0x 00 00 00 00 Start:32 As Expected

setStart(32); setStart(0); 0x 00 00 00 00 Start:0 Throws Exception

F3

not() 0x FF FF FF FF As Expected not().and(0x00 00 00 00); 0x 00 00 00 00 As Expected setStart(8);not().and(0x00 00 00 00 Start:0);

0x 00 00 00 FF As Expected

10.2 StreamBuffer Test CasesTest Expected Result ResultConstruct, Deconstruct Cleanly deleted object As Expected Construct(0,0); Throw Exception Nothing

ThrownF1

Construct(0,1); Throw Exception Nothing Thrown

F2

Construct(1,0); Throw Exception Nothing Thrown

F3

Construct(1,1); Returns OK As Expected Construct(2^31 -1,1); Returns OK (but uses a

lot of RAM)Mallocs nearly 2GB of RAM and crashes due to OutOfMemory

F4

Construct(1,2^31 -1); Returns OK (but uses a Wasn’t

- 59 of 61 -


Andrew Brampton

lot of RAM) Tested

Following are done with a new Construct(10,10)Read(100); Reads 0 bytes As Expected Write(0, data); Stores 10 bytes at

index 0As Expected

Write(1, data); Stores 10 bytes at index 1

As Expected

Write(0, data);Write(10, data);

Stores at index 0, but stores nothing at index 10 (since its out of range)

As Expected

Write(10, data); Stores at index 10 and moves internal pointer to start at 10

As Expected

Write(0, data);Read(100);

Reads 10 bytes As Expected

Write(0, data);Write(1, data);Read(100);


Write(0, data);Read(100);Read(100);

Reads 10 bytes then Reads 0 bytes

As Expected

Write(0, data);Read(100);Write(1, data);Read(100);

Reads 10 bytes then Reads 10 bytes

As Expected

Write(0, data);Read(5);Read(5);

Reads 5 bytes, then Reads 5 bytes

As Expected

Write(0, data);Write(1, data);Read(6);Read(6);Read(10);

Reads 6 bytes, then Reads 6 bytes, then Reads 8 bytes

As Expected

Write(0, data);Peer(1);


Write(1, data);Peer(1);


Write(0, data);Write(1, data);Peer(1);


10.3 PeerConnection Test CasesTest Expected Result ResultConstruct, Deconstruct Cleanly deleted object As Expected Construct(ip, port) Constructs correctly As Expected Construct(incoming SOCKET) Constructs, and sets

internal state to openAs Expected

Open() with invalid IP Throw PeerException As Expected Open() with invalid Port Throw PeerException As Expected Open() with hostname Resolve hostname and

connectAs Expected

Open() when already open Returns nothing As Expected Close() when open Closes and cleans up As Expected Close() when closed Returns nothing As Expected Remote connection closes Throw PeerException and

set closedAs Expected

- 60 of 61 -


Andrew Brampton

isOpen() while open Returns true As Expected isOpen() while closed Returns false As Expected getPeerBitmap() before open

Returns empty bitmap As Expected

getPeerBitmap() while connected

Returns bitmap As Expected

getPeerBitmap() after connected

Returns empty bitmap Returns old bitmap

F1

RequestPiece(valid piece) Send Request As Expected RequestPiece(invalid piece)

Throw OutOfRange Sent Request

F2

SendPacket(packet) Send Packet As Expected

10.4 Project Proposal

- 61 of 61 -


Documents

Word Doc Download