6
ASSIGNMENT: DATA COMMUNICATION (KEEW3202) STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________ Page 1 | 6 INTRODUCTION [1] When attempting to access information on devices such as PC, laptop, PDA, or cell phone, the data might not be physically stored on their device. In this case, a request to access that information must be made to the device where the data resides. The request for data can occur and be fulfilled using the client/server model, application layer services and protocols, and peer-to-peer (P2P) networking and applications. The peer-to-peer (P2P) model involves two distinct forms which are peer-to-peer network design and peer-to-peer applications. Both forms have similar features but work differently. The current network scenario is dominated by the TCP/IP protocol that naturally suits the P2P model. However, there is also a need to provide the following services which P2P will pivot [2]: a) Subscription service used by the current members to reject or accept new subscriptions to a group. Peers wishing to join a peer group must first locate a current member, and then request to join. b) Discovery service used by peer members to search for peer-group resources. Only the peers that are currently logged on will be the ones that are searched. c) Peer monitoring service to keep a close track of a peer's status. Such a service is useful when features such as reliability and guaranteed service times are to be provided to the subscriber of a P2P network. d) Access Service used to validate requests made by one peer to another. The peer requiring data from another peer provides its credentials and particulars about the request being made. The access service has to determine if the access is permitted and if the request is warranted. P2P SYSTEMS Peer-to-peer systems have been defined in many papers. Here are two definitions that cover the concepts peer-to- peer network and peer-to-peer systems [3]: “Distributed network architecture may be called a peer-to-peer network, if the participants share a part of their own hardware resources (processing power, storage capacity, network link capacity, printers). These shared resources are necessary to provide the Service and content offered by the network (e.g. file sharing or shared workspaces for collaboration). They are accessible by other peers.” “Peer-to-peer systems are distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority.” [1] To better appreciate the P2P model, let’s have a brief review of the client/server model. In the client/server model, the device requesting the information is called a client and the device responding to the request is called a server. This model is considered to be in the application layer. There exist a server that is the place where resources are stored. The client (a PC host) makes a request for a file to the server and the server respond by transferring the file to the client. In a similar manner, the client can also transfer a file to the server for storage purpose. In a P2P network, a dedicated server is not required. Multiple computers can be connected through a network to share resources such as printers and files with needing the assistance of a server. Each of the end devices connected is called a peer and can function as either a server or a client on a per-request basis. One computer might also assume the roles both the server and client simultaneously for several simultaneous transactions. Figure 1: Example of P2P networks

Data Comm. Assignment1

Embed Size (px)

DESCRIPTION

data communication assignment

Citation preview

  • ASSIGNMENT: DATA COMMUNICATION (KEEW3202)

    STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________

    P a g e 1 | 6

    INTRODUCTION

    [1] When attempting to access information on devices such as PC, laptop, PDA, or cell phone, the data might not

    be physically stored on their device. In this case, a request to access that information must be made to the device where

    the data resides. The request for data can occur and be fulfilled using the client/server model, application layer services

    and protocols, and peer-to-peer (P2P) networking and applications. The peer-to-peer (P2P) model involves two distinct

    forms which are peer-to-peer network design and peer-to-peer applications. Both forms have similar features but work

    differently. The current network scenario is dominated by the TCP/IP protocol that naturally suits the P2P model.

    However, there is also a need to provide the following services which P2P will pivot [2]:

    a) Subscription service used by the current members to reject or accept new subscriptions to a group. Peers wishing to join

    a peer group must first locate a current member, and then request to join.

    b) Discovery service used by peer members to search for peer-group resources. Only the peers that are currently logged on

    will be the ones that are searched.

    c) Peer monitoring service to keep a close track of a peer's status. Such a service is useful when features such as reliability

    and guaranteed service times are to be provided to the subscriber of a P2P network.

    d) Access Service used to validate requests made by one peer to another. The peer requiring data from another peer

    provides its credentials and particulars about the request being made. The access service has to determine if the access is

    permitted and if the request is warranted.

    P2P SYSTEMS

    Peer-to-peer systems have been defined in many papers. Here are two definitions that cover the concepts peer-to-

    peer network and peer-to-peer systems [3]:

    Distributed network architecture may be called a peer-to-peer network, if the participants share a part of their own hardware resources (processing power, storage capacity, network link capacity, printers). These shared resources are

    necessary to provide the Service and content offered by the network (e.g. file sharing or shared workspaces for

    collaboration). They are accessible by other peers.

    Peer-to-peer systems are distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of

    adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and

    performance, without requiring the intermediation or support of a global centralized server or authority.

    [1] To better appreciate the P2P model, lets have a brief review of the client/server model. In the client/server model, the device requesting the information is called a client and the device responding to the request is called a server.

    This model is considered to be in the application layer. There exist a server that is the place where resources are stored.

    The client (a PC host) makes a request for a file to the server and the server respond by transferring the file to the client.

    In a similar manner, the client can also transfer a file to the server for storage purpose.

    In a P2P network, a dedicated server is not required. Multiple computers can be connected through a network to

    share resources such as printers and files with needing the assistance of a server. Each of the end devices connected is

    called a peer and can function as either a server or a client on a per-request basis. One computer might also assume the

    roles both the server and client simultaneously for several simultaneous transactions.

    Figure 1: Example of P2P networks

  • ASSIGNMENT: DATA COMMUNICATION (KEEW3202)

    STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________

    P a g e 2 | 6

    A P2P application, unlike a peer-to-peer network, allows a device to act as both a client and a server within the

    same communication session. P2P applications can be used on peer-to-peer networks, in client/server networks, and

    across the Internet. Figure 3 shows two phones belonging to the same network sending an instant message with the digital

    traffic between the two phones shown on top. Both can initiate a communication and are considered equal in the

    communication process. However, each end device needs to provide a user interface and run a background service. When

    you launch a specific peer-to-peer application, it invokes the required user interface and background services. After that,

    the devices can communicate directly.

    Figure 2: Example of P2P applications

    [7] Devices will need to be installed with a P2P program that creates a virtual network between these

    communities of P2P users. It will appear to the users as if their device is in a P2P network allowing them to share files to

    other users and download files shared by other users. It is very similar to our Instant Messaging like Yahoo, AOL or

    GTalk where even though the person we are communicating with are on a different network but a virtual network is

    created where it looks like we are on a same network and we can share files and chat.

    CHARACTERISTICS OF MOST P2P SYSTEMS [3]

    a) Resource sharing each peer contributes system resources to the operation of the P2P system. Ideally this resource sharing is proportional to the peers use of the P2P system, but many systems suffer from the free rider problem. b) Networked all nodes are interconnected with other nodes in the P2P system, and the full set of nodes is members of a connected graph. When the graph is no longer connected, the overlay is said to be partitioned.

    c) Decentralization the behavior of the P2P system is determined by the collective actions of peer nodes, and there is no central control point.

    d) Symmetry nodes assume equal roles in the operation of the P2P system. In many designs this property is relaxed by the use of special peer roles such as super peers or relay peers.

    e) Autonomy participation of the peer in the P2P system is determined locally, and there is no single administrative context for the P2P system.

    f) Self-organization the organization of the P2P system increases over time using local knowledge and local operations at each peer, and no peer dominates the system.

    g) Scalable This is a pre-requisite of operating P2P systems with millions of simultaneous nodes, and means that the resources used at each peer exhibit a growth rate as a function of overlay size that is less than linear. It also means that the

    response time doesnt grow more than linearly as a function of overlay size. h) Stability Within a maximum churn rate, the P2P system should be stable, i.e., it should maintain its connected graph and be able to route deterministically within a practical hop-count bounds.

    P2P SYSTEMS ARCHITECTURES

    [3] In a pure P2P network, there is no notion of clients/servers but only equal peer nodes that simultaneously

    function as both "clients" and "servers" to the other nodes on the network. This differs from the client/server model where

    communication is usually to and from a central server.

    The modern peer-to-peer systems are often implemented using an abstract overlay network, built at Application

    Layer, on top of the native or physical network topology. Such overlays are used for indexing and peer discovery and

    make the P2P system independent from the physical network topology. Content is typically exchanged directly over the

    underlying Internet Protocol (IP) network. The two main P2P overlay architectures which are the unstructured P2P

    overlay architecture and structured P2P overlay architecture. We will only discuss an overview of the structured P2P

    overlay but we will be discussing about unstructured P2P overlay in more detailed. The definition of an overlay network:

  • ASSIGNMENT: DATA COMMUNICATION (KEEW3202)

    STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________

    P a g e 3 | 6

    An overlay network is an application layer virtual or logical network in which end points are addressable and that provides connectivity, routing, and messaging between end points. Overlay networks are frequently used as a substrate

    for deploying new network services, or for providing a routing topology not available from the underlying physical

    network. Many peer-to-peer systems are overlay networks that run on top of the Internet.

    a) Unstructured P2P overlay architecture [4]

    An unstructured overlay is an overlay in which a node relies only on its adjacent nodes for delivery of messages to other nodes in the overlay. Example message propagation strategies are flooding and random walk. Our study will focus on the file sharing application, which is one of the most important applications for P2P networks. Unstructured overlays can be

    further classified into centralized, distributed, hybrid and some other approaches for file sharing.

    Figure 3: Search process in unstructured P2P networks. (a) Napster (b) Gnutella (c) Kazaa (d) BitTorrent

    i. A Centralized Approach: Napster

    Napster file sharing system consists of a central directory server and a set of registered users (or peers). The server

    maintains information of all files in the system, including an index with metadata (such as file name and size) of all files

    in the system, a list of all registered peers, and a list showing the files that each peer holds and shares. When a new peer

    joins the system, it contacts the server and reports a list of files it maintains and shares. When a peer wants to search for a

    file, it sends a request to the server. The server will return a list of peers that hold the matching file. The searching peer

    then contacts the returned peers to download the file. Figure 3(a) shows the search process in Napster. When peer A wants

    to search for some file, it contacts the central server. The server returns some peers that hold the file, say, peer B. Peer A

    then starts to download the file from peer B.

    Napster is easy to be implemented as we only needs to deploy and maintain a central server. The system is also

    highly adaptive to peer joining and leaving. However, it is not scalable and the server needs to have much resource (such

    as computational capability and bandwidth) to support a large number of peers. In addition, the server forms a single point

    of failure. If the server is down, the whole system is broken.

    ii. A Distributed Approach: Gnutella

    In the basic Gnutella protocol, when a new peer joins the system, it first connects to some public peers then sends

    a PING message to any peer it is connected to, to announce the existence of the new peer. Upon receiving a PING

    message, a Gnutella peer returns a PONG message and propagates the PING message to its neighbors. In a dynamic

    network with frequent peer joining and leaving, a peer periodically sends PING messages its neighbors. Search in

    Gnutella is based on flooding, which is broadcasting in the overlay. To reduce the amount of query messages in the

    network, each query message contains a time-to-live (TTL) field.

    Figure 3(b) shows the search process in Gnutella. Suppose peer A wants to search for some file. It floods its

    search query to its neighbors, i.e., peers B and D in the figure. When peer B receives the query, it checks whether itself

    holds the matching file. If not, it forwards the query to its neighbors. As in the example, peer B forwards the query to its

    neighbor C. Suppose C holds the file that A wants. C returns a response to the peer that sends it the query, which is B in

    the figure. B then continues forwarding the response to the query sender A. Finally, A contacts C to download the file.

    Gnutella is a dynamic, self-organized network unlike Napster. Each peer independently connects to and

    communicates with a few other peers in the system. The system is highly robust to peer dynamics through the exchange of

    PING and PONG messages. A limitation of Gnutella is its relatively low search efficiency.

  • ASSIGNMENT: DATA COMMUNICATION (KEEW3202)

    STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________

    P a g e 4 | 6

    iii. A Hybrid Approach: FastTrack/Kazaa

    The hybrid approach combines the approach used in purely centralized networks and purely distributed networks

    to overcome limitations. FastTrack is a typical example as a partially centralized P2P protocol. In FastTrack, peers with

    the fastest Internet connections and the most powerful computers are automatically designated as supernodes. A super

    node maintains information about some resource and connections with other supernodes. A peer first searches for the

    closest super node, which returns immediate results if any and refers the search to other supernodes if needed. Two

    practical softwares based on FastTrack are Kazaa and Grokster. But the latter closed its service in 2005 due to the copyright issue.

    Figure 3(c) shows the search process in Kazaa. When peer A wants to search for some file, it sends the search

    query to the closest super node. The super node either returns some matching peers, or forwards the query to other

    supernodes. Finally, A will obtain some matching peers from the super node (say, peer B in the figure) and download the

    file from these peers. Therefore, an ordinary peer (e.g., peer A in the figure) communicates with a super node as if

    communicating with the server in Napster. Then, Gnutella like search is performed in a highly pruned overlay network of

    supernodes.

    Kazaa achieves much lower search time compared to purely distributed networks like Gnutella. Search among

    supernodes is much faster than search among all peers, because the number of supernodes is much smaller than the total

    number of peers. The high bandwidth and large storage space of supernodes can efficiently process a large amount of

    queries from ordinary peers. The system hence makes good use of peer heterogeneity. In addition, unlike Napster, it does

    not form a single point of failure. The peers connecting to them can connect to other supernodes if some supernodes go

    down.

    iv. Other Approach: BitTorrent

    BitTorrent is a P2P system that does not belong to any of the above categories. BitTorrent uses a central location

    to coordinate data upload and download among peers. To share a file f, a peer first creates a small torrent file, which

    contains metadata about f, e.g., its length, name and hashing information. Usually, BitTorrent cuts a file into pieces of

    fixed size, typically between 64 KB and 4 MB each. Each piece has a checksum from the SHA1 hashing algorithm, which

    is also recorded in the torrent file. Most importantly, the torrent file contains the URL of a tracker, which keeps track of

    all the peers who have file f (either partially or completely) and the lookup peers. A peer that wants to download the file

    first obtains the corresponding torrent file, and then connects to the specified tracker. The tracker responds with a random

    list of peers which are downloading the same file. The requesting peer then connects to these peers for downloading.

    Figure 3(d) shows the search process in BitTorrent. When peer A wants to search for some file, it first needs to

    obtain the corresponding torrent for the file. From the torrent, A knows the address of the tracker and connects to the

    tracker. The tracker then returns a list of peers who are downloading or sharing the file. A then exchanges data with these

    peers.

    The centralization of trackers in BitTorrent systems brings some limitations. If a tracker is down, peers will not be

    able to start their sharing (by uploading their torrents to the tracker), and new incoming peers cannot start their

    downloading. To overcome this, the latest BitTorrent clients implement a decentralized tracking mechanism (e.g.,

    Torrent, BitComet, KTorrent). In the mechanism, every peer acts as a mini-tracker. Peers first join a DHT network, which is inherently implemented in the BitTorrent client. A torrent is then stored at a certain peer according to the DHT

    storage method. All peers in the DHT network can search for the torrent through DHT search. Therefore, this mechanism

    eliminates central trackers from the system.

    b) Structured P2P overlay architecture

    [5] Structured P2P overlay is a network overlay that connects nodes using a particular data structure or protocol to ensure that node lookup or data discovery is deterministic. Early versions P2P systems mainly consisted of unstructured overlays that organize nodes into random data structures. These unstructured overlays use techniques such as

    walking or flooding the nodes in the system for lookup, and are often optimized for some common lookup queries. But, in

    general, these unstructured overlays are quite un- predictable for finding rare items and for some real-time applications such as voice, video sharing etc. To overcome these issues, structured overlays are developed to provide deterministic

    bounds on the data discovery. Structured overlays provide scalable network overlays based on a distributed data structure

    that supports deterministic behavior for data lookup. Structured P2P overlays impose restrictions on node placement in the

    overlay and hence, improve the efficiency of data lookup. We categorize structured P2P systems in terms of the bound on

  • ASSIGNMENT: DATA COMMUNICATION (KEEW3202)

    STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________

    P a g e 5 | 6

    numbers of hops required for data lookup and present issues such as node lookup, finger table maintenance, and join/leave

    properties of the overlays.

    [3] Each peer has a local routing table which is used by the forwarding algorithm. The peers routing table is initialized when the peer joins the overlay, using a specified bootstrap procedure. Peers periodically exchange routing

    table changes as part of overlay maintenance. The majority of structured overlays use key-based routing in which a set of keys is associated with addresses in the address space such that the nearest peer to an address stores the values for the

    associated keys, and the routing algorithm treats keys as addresses. A distributed hash table (DHT) is a structured overlay that uses key-based routing for put and get index operations and in which each peer is assigned to maintain a

    portion of the DHT index. Because the address space is virtualized and peer addresses are typically randomly assigned,

    peers which are neighbors in the overlay can be distant in the underlying network. While this improves the fault tolerance

    of the overlay, it causes significant performance loss. Consequently, topology-aware overlays use measurements of

    proximity of peers in the underlying network to create neighbor peers in the overlay.

    ADVANTAGES AND DISADVANTAGES OF P2P NETWORKS AND APPLICATIONS [2]

    P2P networks has advantage of providing us with increased availability of resources by sharing of resources

    between peers in the same network, may it be computational resources or content. It also provides enhanced load

    balancing feature. In a situation where a piece of data is present only at a particular peer, it is possible that the peer is

    overburdened with requests. P2P can circumvent this problem by providing multiple copies of data. Also, using explicit

    caching algorithms, intermediate peers cache frequently used data and helps to distribute the content more evenly. Thus

    query load is more evenly balanced. P2P networks also provides redundancy and fault tolerance feature. In case a peer in

    the network goes down, we can rely on other peers to perform the required task or as source of the same data because of

    the fast duplication of data in P2P model. Besides that, P2P networks also enable content based addressing. In the present

    Internet scene, there may be very little correspondence between the site name a person typed and its contents. In P2P, the

    exact address of a node storing a particular content remains transparent to the user. The user queries the network for the

    content and P2P software translates the requests into specific nodes that hold the content. This procedure can lead to a

    grouping of addresses based on the content the respective nodes store which can lead to more refined data repository.

    P2P network has disadvantage of having spurious content and poor connection due to lack of central authority,

    thus, the quality of the content posted on the peer group is questionable. For example the Mp3 version of the same song

    may be available as a copy with a very good sound quality and another copy with poor quality. But for the P2P search

    both versions are part of the same search and indistinguishable, until actually heard. Also, slow and error prone dial up

    connections used by some of the peers may disrupt the normal functioning of the network. P2P networks also have

    numerous security considerations that are discussed in the next section.

    SECURITY CONSIDERATIONS OF P2P [8]

    Security is an important issue when implementing a system. The first issue that needs to be considered is to

    which extent the nodes in the system can be trusted. If all the nodes in the system are fully trusted (all the nodes are

    trusted to never act in a malicious way), P2P architecture can achieve a high level of security. However, if nodes are not

    fully trusted and can be expected to behave in malicious ways, providing an acceptable level of security in a P2P

    environment becomes significantly more challenging because of its distributed ownership and lack of centralized control.

    The P2P model networks decentralize the resources on a network, thus, information can be located anywhere on

    any connected device without the need of a dedicated server. However, this makes it more difficult to enforce security and

    access policies in networks that have many computers. User accounts and access rights must be set individually on each

    peer device.

    P2P allows attackers to passively obtain valid IP addresses of potential victims without performing active scans

    because a given peer is typically connected to multiple peers. This attack is much more efficient than performing scans

    when the address space to be scanned is large such as the current IPv6 address space and sparsely populated. Additionally,

    due to the high correlation between a particular application and a particular operating system, an attacker can launch

    attacks that exploit known specific vulnerabilities of an operating system.

    Central elements in centralized architectures become an obvious target for attacks. P2P systems minimize the

    amount of central elements and, thus, are more resilient against attacks targeted only at a few elements. Besides that, it is

  • ASSIGNMENT: DATA COMMUNICATION (KEEW3202)

    STUDENT: INDIRA KARIMOVA (KEW100701) _______________________________________________________________________________________________________________________________________

    P a g e 6 | 6

    also important to consider a number of threats that are specific to P2P systems which mainly focus on the data storage

    functions and the routing of P2P systems.

    In a P2P system, messages between two given peers generally traverse a set of intermediate peers that help route

    messages between the two peers. Those intermediate peers compromised by the attacker can attempt to a man-in-the-

    middle attacks since they are on the path between the two given peers. The Sybil attack is an example of such an attack.

    This type of attack can be mitigated by controlling how peers obtain their identifiers such as by having a central authority.

    We can also encrypt message parts that are not required for routing to prevent this type of attack. Without the key to

    decrypt the message, the attacker will not be able to view the actual message content. Attackers can also attempt to launch a set of attacks against the routing of the P2P system by modifying the routing of the system in order to be able to launch

    on-path attacks. Attackers can use forged routing maintenance messages for this purpose. The Eclipse attack is an

    example of such an attack. Enforcing structural constraints or enforcing node degree bounds can mitigate this type of

    attack.

    An attacker can create a message and claim that it was actually created by another peer. The attacker can even

    take a legitimate message as a base and modify it to launch the attack. Peer and message authentication techniques can be

    used to avoid this type of attack.

    In P2P-specific attacks against the data storage function of a P2P system, an attacker can refuse to store a

    particular data object or claim that a particular data object does not exist even if another peer created it and stored it on the

    attacker. These are called DoS (Denial-of-Service) attacks and can be mitigated by using data replication techniques and

    performing multiple, typically parallel, searches. It is also possible to launch DoS attacks by modifying or dropping

    routing maintenance messages or by creating forged ones but we can mitigate this by having nodes get routing tables from

    multiple peers. By creating churn, attackers can also launch a DoS attack. By leaving and joining a P2P overlay rapidly

    many times, a set of attackers can create large amounts of maintenance traffic and make the routing structure of the

    overlay unstable. We can mitigate this by limiting the amount of churn per node.

    CONCLUSION

    P2P systems provide many new opportunities of communicating, sharing resources, and computing over the Internet. New

    advancement in software and hardware technology has eased the realization of P2P systems. Although there are still

    numerous disadvantages and security considerations involved in P2P systems, many innovative ideas and much efforts are

    done to enhance the P2P systems technology.

    REFERENCES

    [1] Mark, A. D., Rick, M., & Antoon, W. R. (2008). Application Layer Functionality and Protocols. Network

    Fundamentals CCNA Exploration Companion Guide (pp. 63-98). Indianapolis, IN: Cisco Press

    [2] Kini, U. A., & Shetty, S. M. (2001). Peer-to-Peer networking. Resonance, 6(12), 69-79

    [3] Xuemin, S., Yu, H., Buford, J., & Akon, M. (2009). Introduction to Peer-to-Peer Networking. Handbook of

    Peer-to-Peer Networking (pp. 44-154). New York, NY: Springer

    [4] Xuemin, S., Yu, H., Buford, J., & Akon, M. (2009). Unstructured P2P Overlay Architectures. Handbook of

    Peer-to-Peer Networking (pp. 155-256). New York, NY: Springer

    [5] Xuemin, S., Yu, H., Buford, J., & Akon, M. (2009). Structured P2P Overlay Architectures. Handbook of

    Peer-to-Peer Networking (pp. 257-435). New York, NY: Springer

    [6] Wikipedia. Peer-to-peer. Retrieved 20, March, 2013 from http://en.wikipedia.org/wiki/Peer-to-peer

    [7] Vikran, K. (November, 2009). What do P2P Applications do and How to block Peer to Peer Applications

    (P2P) using Symantec Endpoint Protection? Retrieved 20, March, 2013 from

    http://www.symantec.com/connect/articles/what-do-p2p-applications-do-and-how-block-peer-peer-

    applications-p2p-using-symantec-endpoin

    [8] Internet Engineering Task Force (IETF). (November, 2009). RFC 5694 - Peer-to-Peer (P2P) Architecture:

    Definition, Taxonomies, Examples, and Applicability. Retrieved 15, March, 2013 from

    http://tools.ietf.org/html/rfc5694