A Peer-to-Peer File System OSCAR LAB. Overview A short introduction to peer-to-peer (P2P) Systems Ivy: a read/write P2P file system (OSDI’02)

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li> Slide 1 </li> <li> A Peer-to-Peer File System OSCAR LAB </li> <li> Slide 2 </li> <li> Overview A short introduction to peer-to-peer (P2P) Systems Ivy: a read/write P2P file system (OSDI02) </li> <li> Slide 3 </li> <li> What is P2P ? An architecture of equals (as opposed to client/server), each peer/node acts as Client Server Router Harness aggregate resources (e.g., CPUs, memory, disk capacities) among peers/nodes </li> <li> Slide 4 </li> <li> What is P2P ? Technical trends Creation of huge pool of available latent resources Increasing processing power of PCs Decreasing cost and increasing capacity of disk space Widespread penetration of broadband </li> <li> Slide 5 </li> <li> P2P Systems Centralized: have a centralized directory service E.g., Napster Limits scalability and poses a single point of failure Decentralized and Untructured No precise control over the network topology or data placement E.g., Gnutella Controlled message flooding, limiting scalability </li> <li> Slide 6 </li> <li> P2P Systems Decentralized and Structured Tightly control the network topology and data placement Loosely structured: Freenet (the file placement is based on hints) Highly structured: Pastry, Chord, Tapestry, and CAN </li> <li> Slide 7 </li> <li> Decentalized and Highly Structured P2P Systems Precise control of the network topology and data placement A distributed hash table (DHash) Each node has a host-ID (hash of the public key or IP addr.) Each file/object has a file-ID (hash of the file pathname) Both files and nodes are mapped into the Dhash Basic interface put(key, value) get(key) </li> <li> Slide 8 </li> <li> Decentalized and Highly Structured P2P Systems A location and routing infrastructure Application-level, routed by an ID not IP address Routing effciency: O(logN) Advantages Good scalability (O(logN) in routing effciency and routing table) Reliability Self-maintenance (node addition/removal) Good performance (compared to other P2P systems) Issues Routing performance (compared to IP routing) Security Other issues </li> <li> Slide 9 </li> <li> P2P Applications Content delivery systems Application-level multicast Publishing/file sharing systems P2P storage systems (e.g., PAST, CFS, OceanStore) P2P file systems </li> <li> Slide 10 </li> <li> Ivy: A Read/Write P2P File System Introduction Design Issues Performance Evaluation Summary </li> <li> Slide 11 </li> <li> Introdcution Challenges: Previous P2P systems are either read-only or one single writer, so multiple writers pose file system consistency issue Unreliable participants render locking unattractive (for consistency) Undo/ignore untrusty participants modifications Security over untrusted storage of nodes Resolve update conflicts due to network partition High availability vs. strong consistency </li> <li> Slide 12 </li> <li> Design Issues DHash infrastructure Log-based metadata and data NFS-like file system </li> <li> Slide 13 </li> <li> DHash A distributed P2P hash table Stores participants logs Basic operations put(key, value) get(key) E.g., key = content-hash of a log, value = log record </li> <li> Slide 14 </li> <li> Log Data Structure One log per participant A log contains all of one participants modifications (log records) to a file system data and metadata Each log record is a content-hash block Each participant appends log records only to its own log, but reads from all participants logs Ignore some untrusty participants modifications by without reading its log </li> <li> Slide 15 </li> <li> Log Data Structure </li> <li> Slide 16 </li> <li> Slide 17 </li> <li> Slide 18 </li> <li> Using the Log Append a log record Derive a log record from a NFS request Its prev field points to the last record Insert the new log record into DHash Sign a new log-head pointing to the new log record Insert the new log-head into DHash </li> <li> Slide 19 </li> <li> Using the Log File system creation Create a new log with an End record An Inode record with random i-number for the root directory A log-head Using the root i-number as the NFS root file handle </li> <li> Slide 20 </li> <li> Using the Log File creation Request: create (directory i-number, file name) An Inode record with a new random i-number A Link record Return the NFS client with the i-number as a file handle If write the file, create a Write record File read Request: read (i-number, offset, length) Scan logs accumulating data from Write records overlapping the range of data to be read, while ignoring data hiddened by SetAttr records that indicate file trucation. </li> <li> Slide 21 </li> <li> Using the Log File name lookup Request: open (directory i-number, file name) Scan logs for a corresponding Link record First encounter a corresponding Unlink record, indicating that the file doesnt exist File attributes File length, mtime, ctime, etc. Scan logs to incrementally compute attributes </li> <li> Slide 22 </li> <li> User Cooperation: Views View: the set of logs comprising a file system View block A DHash content-hash block containing pointers to all log-heads in the view Contains the root directory i-number One Property: immutable (different file systems with different view blocks ) Name a file system with the content-hash key of its view block, like self-certifying file system (SFS) </li> <li> Slide 23 </li> <li> Combining Logs Problem: concurrent updates result in conflicts, how to order log records ? Solution: Version Vector in each log record Detect update conflicts E.g., (A:5, B:7) &lt; (A:6, B:7) compatible (A:5, B:7) vs. (A:6, B:6) concurrent version vectors, order them by comparing the public keys of two logs </li> <li> Slide 24 </li> <li> Snapshots Problem ? have to traverse the entire log to answer requests (high overhead and inefficiency). Solution: snapshots Avoid traversing the entire log Consistent state of the file system Private per participant, periodically construct it Stored in DHash, sharing contents among snapshots Contains a file map, a set of i-nodes, and some data blocks, see Figure 2 </li> <li> Slide 25 </li> <li> Snapshot Data Structure </li> <li> Slide 26 </li> <li> Snapshots Building snapshots perform all log records newer than the previous snapshot Using snapshots First traverse log records newer than current snapshot If this cant fulfill a NFS request, further search information in current snapshot Mutually-trusted participants can share snapshots </li> <li> Slide 27 </li> <li> Cache Consistency Most updates are immediately visible Store the new log record and update the new log-head before replying to an NFS request Query the latest log-heads for latest updates upon each NFS operation Modified close-to-open consistency for file reads/writes Open() fectch all log-heads for subsequent reads/writes Write() write data on its cache, defers writing data to DHash Close() push log records (if any by writes), update log- head </li> <li> Slide 28 </li> <li> Exclusive Create Requirement: create directory entries be exclusive Some applications use this semantics to implement locks Solution: </li> <li> Slide 29 </li> <li> Partitioned Updates Close-to-open consistency guaranteed only if network is fully connected How if network partitioned? Maximize availability (by allowing concurrent updates) Compromise consistency After partition heals, using Version Vectors Application-level solver to resolve conflicts (Harp) </li> <li> Slide 30 </li> <li> Security and Integrity Form another view to exlcude bad/misbehavoring/malicious participants Using content-hash key and public-hash key to protect data integrity </li> <li> Slide 31 </li> <li> Evaluation Goal: understand the cost of Ivys design in terms of network latency and cryptographic operations Workload: Modified Andrew Benchmark (MAB) Performance in a WAN </li> <li> Slide 32 </li> <li> Many Logs, One Writer The number of logs has relatively little impact Because Ivy fetches the log-heads/log-records in parallel </li> <li> Slide 33 </li> <li> Many DHash Servers More impact, since more messages are required to fetch log- records </li> <li> Slide 34 </li> <li> Many Writers More impact, have to fetch other participants newly logged updates </li> <li> Slide 35 </li> <li> Summary Log-based data/metadata, avoiding using locking Close-to-open consistency Tradeoff between high availabilty and strong consistency Allow concurrent updates, detect and reslove update conflicts Performance: 2-3 times slower than NFS Limitations ? Small scale: limited to the number of logs Hard to hide wide-area network latency </li> <li> Slide 36 </li> <li> Thanks </li> </ul>


View more >