Transcript
Page 1: Lecture 21 Distributed Systems. Checkpoint In journaling

Lecture 21Distributed

Systems

Page 2: Lecture 21 Distributed Systems. Checkpoint In journaling

Checkpoint

• In journaling

Page 3: Lecture 21 Distributed Systems. Checkpoint In journaling

Metadata Journaling

• 1/2. Data write: Write data to final location; wait for completion (the wait is optional; see below for details).• 1/2. Journal metadata write: Write the begin block and

metadata to the log; wait for writes to complete.• 3. Journal commit: Write the transaction commit block

(containing TxE) to the log; wait for the write to complete; the transaction (including data) is now committed.• 4. Checkpoint metadata: Write the contents of the metadata

update to their final locations within the file system.• 5. Free: Later, mark the transaction free in journal superblock

Page 4: Lecture 21 Distributed Systems. Checkpoint In journaling

Checkpoint

• In journaling• Write the contents of the update to their final locations

within the file system.

• In LFS• Checkpoint regions locate on a special fixed position on

disk.• Checkpoint region contains the addresses of all imap

blocks, current time, the address of the last segment written, etc.• What if checkpoint too often?

Page 5: Lecture 21 Distributed Systems. Checkpoint In journaling

LFS review

• Good for …?• Bad for …?

Page 6: Lecture 21 Distributed Systems. Checkpoint In journaling

Disk after Creating Two Files

Page 7: Lecture 21 Distributed Systems. Checkpoint In journaling

Garbage Collection

• Pick M segments, compact into N (where N < M).• Mechanism: how do we know whether data in

segments is valid?• segment summary that lists inode corresponding to

each data block

• Policy: which segments to compact?• A hot segment: the contents are being frequently over-

written• A cold segment: may have a few dead blocks but the rest

of its contents are relatively stable

Page 8: Lecture 21 Distributed Systems. Checkpoint In journaling

Recovery

• In journaling• If crash before step 2 completes, skip the pending

update• If crash after step 2 completes, transactions are replayed

• In LFS• Identify the newest consistent one• Roll-forward: scan BEYOND the last checkpoint to

recover max data

Page 9: Lecture 21 Distributed Systems. Checkpoint In journaling

Data Integrity and Protection

Page 10: Lecture 21 Distributed Systems. Checkpoint In journaling

Disk Failure Modes

• Fail-stop as assumed by RAID

• Silent faults:• Latent-sector errors (LSEs): a disk sector (or group of

sectors) has been damaged in some way• Block corruption

Page 11: Lecture 21 Distributed Systems. Checkpoint In journaling

Handling Latent Sector Errors• How to detect:• A storage system tries to access a block, and the disk

returns an error

• How to fix:• Use whatever redundancy mechanism it has to return

the correct data

Page 12: Lecture 21 Distributed Systems. Checkpoint In journaling

Detecting Corruption:The Checksum• Common Checksum Functions• XOR• Fletcher checksum• Cyclic redundancy check (CRC)• Collision is possible

Page 13: Lecture 21 Distributed Systems. Checkpoint In journaling

Misdirected Writes

• Arises in disk and RAID controllers which write the data to disk correctly, except in the wrong location• Physical identifier (physical ID)

Page 14: Lecture 21 Distributed Systems. Checkpoint In journaling

Lost Writes

• Occur when the device informs the upper layer that a write has completed but in fact it never is persisted。• Do any of our strategies from above (e.g., basic

checksums, or physical ID) help to detect lost writes?• Solutions:• Perform a write verify or read-after-write• Some systems add a checksum elsewhere in the system

to detect lost writes.• ZFS includes a checksum in each file system inode and

indirect block for every block included within a file

Page 15: Lecture 21 Distributed Systems. Checkpoint In journaling

Scrubbing

• When do these checksums actually get checked?

• Many systems utilize disk scrubbing:• Periodically read through every block of the system• Check whether checksums are still valid• Schedule scans on a nightly or weekly basis

Page 16: Lecture 21 Distributed Systems. Checkpoint In journaling

Overhead of Checksumming• Space• Small

• Time• Noticeable• CPU overhead• I/O overhead

Page 17: Lecture 21 Distributed Systems. Checkpoint In journaling

Distributed Systems

Page 18: Lecture 21 Distributed Systems. Checkpoint In journaling

OSTEP Definition

• Def: more than 1 machine

• Examples: • client/server: web server and web client • cluster: page rank computation

• Other courses• Networking • Distributed Systems

Page 19: Lecture 21 Distributed Systems. Checkpoint In journaling

Why Go Distributed?

• More compute power

• More storage capacity

• Fault tolerance

• Data sharing

Page 20: Lecture 21 Distributed Systems. Checkpoint In journaling

New Challenges

• System failure: need to worry about partial failure.

• Communication failure: links unreliable

• Performance

• Security

Page 21: Lecture 21 Distributed Systems. Checkpoint In journaling

Communication

• All communication is inherently unreliable.

• Need to worry about: • bit errors • packet loss • node/link failure

Page 22: Lecture 21 Distributed Systems. Checkpoint In journaling

Overview

• Raw messages

• Reliable messages

• OS abstractions • virtual memory • global file system

• Programming-languages abstractions • remote procedure call

Page 23: Lecture 21 Distributed Systems. Checkpoint In journaling

Raw Messages: UDP

• API: • reads and writes over socket file descriptors• messages sent from/to ports to target a process on

machine

• Provide minimal reliability features: • messages may be lost• messages may be reordered• messages may be duplicated• only protection checksums

Page 24: Lecture 21 Distributed Systems. Checkpoint In journaling

Raw Messages: UDP

• Advantages • lightweight • some applications make better reliability decisions

themselves (e.g., video conferencing programs)

• Disadvantages • more difficult to write application correctly

Page 25: Lecture 21 Distributed Systems. Checkpoint In journaling

Reliable Messages Strategy• Using software, build reliable, logical connections

over unreliable connections.

• Strategies: • acknowledgment

Page 26: Lecture 21 Distributed Systems. Checkpoint In journaling

ACK

• Sender knows message was received.

Sender [send message]

[recv ack]

Receiver

[recv message][send ack]

Page 27: Lecture 21 Distributed Systems. Checkpoint In journaling

ACK

• Sender misses ACK... What to do?

Sender [send message]

Receiver

Page 28: Lecture 21 Distributed Systems. Checkpoint In journaling

Reliable Messages Strategy• Using software, build reliable, logical connections

over unreliable connections.

• Strategies: • acknowledgment• timeout

Page 29: Lecture 21 Distributed Systems. Checkpoint In journaling

ACKSender

[send message] [start timer]

... waiting for ack ...[timer goes off][send message]

[recv ack]

Receiver

[recv message][send ack]

Page 30: Lecture 21 Distributed Systems. Checkpoint In journaling

Timeout: Issue 1

• How long to wait?• Too long: system feels unresponsive!• Too short: messages needlessly re-sent!

• Messages may have been dropped due to overloaded server. Aggressive clients worsen this.• One strategy: be adaptive!• Adjust time based on how long acks usually take. • For each missing ack, wait longer between retries.

Page 31: Lecture 21 Distributed Systems. Checkpoint In journaling

Timeout: Issue 2

• What does a lost ack really mean?• Maybe the receiver does not get the message• Maybe the receiver gets the message, but the ack is not

delivered successfully

• ACK: message received exactly once• No ACK: message received at most once• Proposed Solution• Sender could send an AckAck so receiver knows whether

to retry sending an Ack• Sound good?

Page 32: Lecture 21 Distributed Systems. Checkpoint In journaling

Reliable Messages Strategy• Using software, build reliable, logical connections

over unreliable connections.

• Strategies: • acknowledgment• timeout• remember sent messages

Page 33: Lecture 21 Distributed Systems. Checkpoint In journaling

Receiver Remembers MessagesSender

[send message]

[timeout][send message]

[recv ack]

Receiver

[recv message][send ack]

[ignore message][send ack]

Page 34: Lecture 21 Distributed Systems. Checkpoint In journaling

Solutions

• Solution 1: remember every message ever sent.

• Solution 2: sequence numbers• give each message a seq number• receiver knows all messages before an N have been seen • receiver remembers messages sent after N

Page 35: Lecture 21 Distributed Systems. Checkpoint In journaling

TCP

• Most popular protocol based on seq nums.

• Also buffers messages so they arrive in order

• Timeouts are adaptive.

Page 36: Lecture 21 Distributed Systems. Checkpoint In journaling

Overview

• Raw messages

• Reliable messages

• OS abstractions • virtual memory • global file system

• Programming-languages abstractions • remote procedure call

Page 37: Lecture 21 Distributed Systems. Checkpoint In journaling

Virtual Memory

• Inspiration: threads share memory

• Idea: processes on different machines share mem• Strategy: • a bit like swapping we saw before • instead of swap to disk, swap to other machine • sometimes multiple copies may be in memory on

different machines

Page 38: Lecture 21 Distributed Systems. Checkpoint In journaling

Virtual Memory Problems

• What if a machine crashes? • mapping disappears in other machines • how to handle?

• Performance? • when to prefetch? • loads/stores expected to be fast

• DSM (distributed shared memory) not used today.

Page 39: Lecture 21 Distributed Systems. Checkpoint In journaling

Global File System

• Advantages • file access is already expected to be slow • use common API • no need to modify applications (sorta true)

• Disadvantages • doesn’t always make sense, e.g., for video app

Page 40: Lecture 21 Distributed Systems. Checkpoint In journaling

RPC: Remote Procedure Call• What could be easier than calling a function?

• Strategy: create wrappers so calling a function on another machine feels just like calling a local function.

• This abstraction is very common in industry.

Page 41: Lecture 21 Distributed Systems. Checkpoint In journaling

RPCMachine A

int main(...) { int x = foo(); }

// client wrapperint foo(char *msg) { send msg to B recv msg from B }

Machine Bint foo(char *msg) { ... }

// server wrapper

void foo_listener() { while(1) { recv, call foo } }

Page 42: Lecture 21 Distributed Systems. Checkpoint In journaling

RPC Tools

• RPC packages help with this with two components.

• (1) Stub generation • create wrappers automatically

• (2) Runtime library • thread pool • socket listeners call functions on server

Page 43: Lecture 21 Distributed Systems. Checkpoint In journaling

Client Stub Steps

• Create a message buffer• Pack the needed information into the message

buffer• Send the message to the destination RPC server• Wait for the reply• Unpack return code and other arguments• Return to the caller

Page 44: Lecture 21 Distributed Systems. Checkpoint In journaling

Server Stub Steps

• Unpack the message• Call into the actual function• Package the results• Send the reply

Page 45: Lecture 21 Distributed Systems. Checkpoint In journaling

Wrapper Generation

• Wrappers must do conversions: • client arguments to message• message to server arguments • server return to message• message to client return

• Need uniform endianness (wrappers do this).

• Conversion is called marshaling/unmarshaling, or serializing/deserializing.

Page 46: Lecture 21 Distributed Systems. Checkpoint In journaling

Stub Generation

• Many tools will automatically generate wrappers: • rpcgen • thrift • protobufs

• Programmer fills in generated stubs.

Page 47: Lecture 21 Distributed Systems. Checkpoint In journaling

Wrapper Generation: Pointers• Why are pointers problematic?• The addr passed from the client will not be valid on the

server.

• Solutions? • smart RPC package: follow pointers • distribute generic data structs with RPC package

Page 48: Lecture 21 Distributed Systems. Checkpoint In journaling

Runtime Library

• Naming: how to locate a remote service

• How to serve calls? • usually with a thread pool

• What underlying protocol to use? • usually UDP

• Some RPC packages enable a asynchronous RPC

Page 49: Lecture 21 Distributed Systems. Checkpoint In journaling

RPC over UDP

• Strategy: use function return as implicit ACK.

• Piggybacking technique.

• What if function takes a long time? • then send a separate ACK

Page 50: Lecture 21 Distributed Systems. Checkpoint In journaling

Conclusion

• Many communication abstraction possible: • Raw messages (UDP) • Reliable messages (TCP) • Virtual memory (OS) • Global file system (OS) • Function calls (RPC)

Page 51: Lecture 21 Distributed Systems. Checkpoint In journaling

Next

• NFS• AFS


Recommended