Lecture 21 Distributed Systems. Checkpoint In journaling

  • Published on
    18-Dec-2015

  • View
    213

  • Download
    1

Embed Size (px)

Transcript

  • Slide 1
  • Lecture 21 Distributed Systems
  • Slide 2
  • Checkpoint In journaling
  • Slide 3
  • Metadata Journaling 1/2. Data write: Write data to final location; wait for completion (the wait is optional; see below for details). 1/2. Journal metadata write: Write the begin block and metadata to the log; wait for writes to complete. 3. Journal commit: Write the transaction commit block (containing TxE) to the log; wait for the write to complete; the transaction (including data) is now committed. 4. Checkpoint metadata: Write the contents of the metadata update to their final locations within the file system. 5. Free: Later, mark the transaction free in journal superblock
  • Slide 4
  • Checkpoint In journaling Write the contents of the update to their final locations within the file system. In LFS Checkpoint regions locate on a special fixed position on disk. Checkpoint region contains the addresses of all imap blocks, current time, the address of the last segment written, etc. What if checkpoint too often?
  • Slide 5
  • LFS review Good for ? Bad for ?
  • Slide 6
  • Disk after Creating Two Files
  • Slide 7
  • Garbage Collection Pick M segments, compact into N (where N < M). Mechanism: how do we know whether data in segments is valid? segment summary that lists inode corresponding to each data block Policy: which segments to compact? A hot segment: the contents are being frequently over- written A cold segment: may have a few dead blocks but the rest of its contents are relatively stable
  • Slide 8
  • Recovery In journaling If crash before step 2 completes, skip the pending update If crash after step 2 completes, transactions are replayed In LFS Identify the newest consistent one Roll-forward: scan BEYOND the last checkpoint to recover max data
  • Slide 9
  • Data Integrity and Protection
  • Slide 10
  • Disk Failure Modes Fail-stop as assumed by RAID Silent faults: Latent-sector errors (LSEs): a disk sector (or group of sectors) has been damaged in some way Block corruption
  • Slide 11
  • Handling Latent Sector Errors How to detect: A storage system tries to access a block, and the disk returns an error How to fix: Use whatever redundancy mechanism it has to return the correct data
  • Slide 12
  • Detecting Corruption: The Checksum Common Checksum Functions XOR Fletcher checksum Cyclic redundancy check (CRC) Collision is possible
  • Slide 13
  • Misdirected Writes Arises in disk and RAID controllers which write the data to disk correctly, except in the wrong location Physical identifier (physical ID)
  • Slide 14
  • Lost Writes Occur when the device informs the upper layer that a write has completed but in fact it never is persisted Do any of our strategies from above (e.g., basic checksums, or physical ID) help to detect lost writes? Solutions: Perform a write verify or read-after-write Some systems add a checksum elsewhere in the system to detect lost writes. ZFS includes a checksum in each file system inode and indirect block for every block included within a file
  • Slide 15
  • Scrubbing When do these checksums actually get checked? Many systems utilize disk scrubbing: Periodically read through every block of the system Check whether checksums are still valid Schedule scans on a nightly or weekly basis
  • Slide 16
  • Overhead of Checksumming Space Small Time Noticeable CPU overhead I/O overhead
  • Slide 17
  • Distributed Systems
  • Slide 18
  • OSTEP Definition Def: more than 1 machine Examples: client/server: web server and web client cluster: page rank computation Other courses Networking Distributed Systems
  • Slide 19
  • Why Go Distributed? More compute power More storage capacity Fault tolerance Data sharing
  • Slide 20
  • New Challenges System failure: need to worry about partial failure. Communication failure: links unreliable Performance Security
  • Slide 21
  • Communication All communication is inherently unreliable. Need to worry about: bit errors packet loss node/link failure
  • Slide 22
  • Overview Raw messages Reliable messages OS abstractions virtual memory global file system Programming-languages abstractions remote procedure call
  • Slide 23
  • Raw Messages: UDP API: reads and writes over socket file descriptors messages sent from/to ports to target a process on machine Provide minimal reliability features: messages may be lost messages may be reordered messages may be duplicated only protection checksums
  • Slide 24
  • Raw Messages: UDP Advantages lightweight some applications make better reliability decisions themselves (e.g., video conferencing programs) Disadvantages more difficult to write application correctly
  • Slide 25
  • Reliable Messages Strategy Using software, build reliable, logical connections over unreliable connections. Strategies: acknowledgment
  • Slide 26
  • ACK Sender knows message was received. Sender [send message] [recv ack] Receiver [recv message] [send ack]
  • Slide 27
  • ACK Sender misses ACK... What to do? Sender [send message] Receiver
  • Slide 28
  • Reliable Messages Strategy Using software, build reliable, logical connections over unreliable connections. Strategies: acknowledgment timeout
  • Slide 29
  • ACK Sender [send message] [start timer]... waiting for ack... [timer goes off] [send message] [recv ack] Receiver [recv message] [send ack]
  • Slide 30
  • Timeout: Issue 1 How long to wait? Too long: system feels unresponsive! Too short: messages needlessly re-sent! Messages may have been dropped due to overloaded server. Aggressive clients worsen this. One strategy: be adaptive! Adjust time based on how long acks usually take. For each missing ack, wait longer between retries.
  • Slide 31
  • Timeout: Issue 2 What does a lost ack really mean? Maybe the receiver does not get the message Maybe the receiver gets the message, but the ack is not delivered successfully ACK: message received exactly once No ACK: message received at most once Proposed Solution Sender could send an AckAck so receiver knows whether to retry sending an Ack Sound good?
  • Slide 32
  • Reliable Messages Strategy Using software, build reliable, logical connections over unreliable connections. Strategies: acknowledgment timeout remember sent messages
  • Slide 33
  • Receiver Remembers Messages Sender [send message] [timeout] [send message] [recv ack] Receiver [recv message] [send ack] [ignore message] [send ack]
  • Slide 34
  • Solutions Solution 1: remember every message ever sent. Solution 2: sequence numbers give each message a seq number receiver knows all messages before an N have been seen receiver remembers messages sent after N
  • Slide 35
  • TCP Most popular protocol based on seq nums. Also buffers messages so they arrive in order Timeouts are adaptive.
  • Slide 36
  • Overview Raw messages Reliable messages OS abstractions virtual memory global file system Programming-languages abstractions remote procedure call
  • Slide 37
  • Virtual Memory Inspiration: threads share memory Idea: processes on different machines share mem Strategy: a bit like swapping we saw before instead of swap to disk, swap to other machine sometimes multiple copies may be in memory on different machines
  • Slide 38
  • Virtual Memory Problems What if a machine crashes? mapping disappears in other machines how to handle? Performance? when to prefetch? loads/stores expected to be fast DSM (distributed shared memory) not used today.
  • Slide 39
  • Global File System Advantages file access is already expected to be slow use common API no need to modify applications (sorta true) Disadvantages doesnt always make sense, e.g., for video app
  • Slide 40
  • RPC: Remote Procedure Call What could be easier than calling a function? Strategy: create wrappers so calling a function on another machine feels just like calling a local function. This abstraction is very common in industry.
  • Slide 41
  • RPC Machine A int main(...) { int x = foo(); } // client wrapper int foo(char *msg) { send msg to B recv msg from B } Machine B int foo(char *msg) {... } // server wrapper void foo_listener() { while(1) { recv, call foo } }
  • Slide 42
  • RPC Tools RPC packages help with this with two components. (1) Stub generation create wrappers automatically (2) Runtime library thread pool socket listeners call functions on server
  • Slide 43
  • Client Stub Steps Create a message buffer Pack the needed information into the message buffer Send the message to the destination RPC server Wait for the reply Unpack return code and other arguments Return to the caller
  • Slide 44
  • Server Stub Steps Unpack the message Call into the actual function Package the results Send the reply
  • Slide 45
  • Wrapper Generation Wrappers must do conversions: client arguments to message messa