62
CS 138 XV–1 Copyright © 2017 Thomas W. Doeppner. All rights reserved. Distributed File Systems

Distributed File Systems - Brown Universitycs.brown.edu/courses/csci1380/s17/lectures/15dfs.pdf · • Common Internet File System – Microsoft’s distributed file system

  • Upload
    vonhi

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

CS 138 XV–1 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Distributed File Systems

CS 138 XV–2 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Outline

•  Failure •  Basic concepts •  NFS version 2 •  CIFS •  DCE DFS •  NFS version 4

CS 138 XV–3 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Synchronous vs. Asynchronous

•  Execution speed – synchronous: bounded – asynchronous: unbounded

•  Message transmission delays – synchronous: bounded – asynchronous: unbounded

•  Local clock drift rate: – synchronous: bounded – asynchronous: unbounded

CS 138 XV–4 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Failures

•  Omission failures – something doesn’t happen

-  process crashes -  data lost in transmission -  etc.

•  Byzantine (arbitrary) failures – something bad happens

-  message is modified -  message received twice -  etc.

•  Timing failures – something takes too long

CS 138 XV–5 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Detecting Crashes

•  Synchronous systems –  timeouts

•  Asynchronous systems – ?

•  Fail-stop – an oracle lets us know

CS 138 XV–6 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Gossip Scenario

Client

Client

Client Client

Replica Manager

Replica Manager

Replica Manager

CS 138 XV–7 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Data Server

Data Server

Data Server

Data Server

DFS Scenario

Client

Client

Client Client File Server

Cache

Cache

Cac

he C

ache

CS 138 XV–8 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Components

•  Data state –  file contents

•  Attribute state – size, access-control info, modification time,

etc. •  Open-file state

– which files are in use (open) –  lock state

CS 138 XV–9 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Possible Locations data

cache

attr cache

open-file state

Client

data cache

attr cache

open-file state

Client

data cache

attr cache

Server

local file system

open-file state

CS 138 XV–10 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

In Practice …

•  Data state – NFS

-  weakly consistent -  less weak if program uses locks

– CIFS and DFS -  strictly consistent

•  Lock state – must be strictly consistent

CS 138 XV–11 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Thursday morning, November 17th At 7:00 a.m.

Maytag, the department’s central file server, will be taken down to kick off a filesystem consistency check.

Linux machines will hang. All Windows users should log off.

Normal operation will resume by 8:30 a.m. if all goes well. All windows users should log off before this time.

Questions/concerns to [email protected]

CS 138 XV–12 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Failures in a Local File System

On-Disk File System

Cache

0 1 2 3

.

.

.

n–1

1 rw 0

Open-File State Server

Client

Client Client

Client

On-Disk File System

Cache

0 1 2 3

.

.

.

n–1

1 rw 0

Open-File State Server

Client

Client Client

Client

CS 138 XV–13 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Distributed Failure

On-Disk File System

Cache

0 1 2 3

.

.

.

n–1

1 rw 0

Open-File State Server

Client

Client Client

Client

On-Disk File System

Cache

0 1 2 3

.

.

.

n–1

1 rw 0

Open-File State Server

Client

CS 138 XV–14 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

In Practice … •  NFS version 2

–  relaxed approach to consistency – handles failures well

•  CIFS – strictly consistent –  intolerant of failures

•  DCE DFS – strictly consistent – sort of tolerant of failures

•  NFS version 4 – either relaxed or strictly-consistent – handles failures very well

CS 138 XV–15 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

NFS Version 2

•  Released in mid 1980s •  Three protocols in one

–  file protocol – mount protocol – network lock manager protocol

Basic NFS Extended NFS

CS 138 XV–16 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Distribution of Components data

cache

attr cache

open-file state

NFSv2 client

data cache

attr cache

open-file state

NFSv2 client

data cache

attr cache

NFSv2 server

local file system

CS 138 XV–17 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Consistency in Basic NFSv2

file x block 1

file x block 5

file y block 2

file y block 17

Data cache

file x attrs

file y attrs

Attribute cache

validity period

validity period

CS 138 XV–18 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

More …

•  All write RPC requests must be handled synchronously on the server

•  Close-to-Open consistency – client writes back all changes on close –  flushes cached file info on open

CS 138 XV–19 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Client Crash Recovery

Data Cache

Server

Data Cache

Process A

Data Cache

Process B

CS 138 XV–20 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Server Crash Recovery

Data Cache

Server

Data Cache

Process A

Data Cache

Process B

CS 138 XV–21 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

File Locking

•  State is required on the server! –  recovery must take place in the event of client

and server crashes •  Locking Protocol is independent of the File

Protocol –  locking is advisory – one can lock a file and ask if a file is locked – not required to honor locks

-  may read/write a file locked by others!

CS 138 XV–22 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Network Lock Manager Protocol

Data Cache

Server

Data Cache

Process A

Data Cache

Process B

lockd statd

lockd statd

lockd statd

CS 138 XV–23 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

NFS Version 3

•  In use at Brown and in most of the rest of the world

•  Basically the same as NFSv2 –  improved handling of attributes – commit operation for writes – various other things

CS 138 XV–24 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

CIFS

•  Common Internet File System – Microsoft’s distributed file system

•  Features – strictly consistent

•  Not featured … – depends on reliability of transport protocol –  loss of connection == loss of session

CS 138 XV–25 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

History

•  Originally a simple means for sharing files – developed by IBM and called server message

block protocol (SMB) –  ran on top of NetBIOS

•  Microsoft took over –  renamed CIFS in late 1990s – uses SMB as RPC-like communication protocol

-  runs on NetBIOS -  usually layered on TCP -  sometimes no NetBIOS, just TCP

CS 138 XV–26 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Consistency vs. Performance

•  Strict consistency is easy … – … if all operations take place on server – no client caching

•  Performance is good … – … if all operations take place on client – everything is cached on client

•  Put the two together …

ø – or you can do opportunistic locking

CS 138 XV–27 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Opportunistic Locks

Server

Open A

OK, Op Lock

Client 1 Client 2

Open A

Revoke Op Lock

OK, changes

OK

CS 138 XV–28 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Back to NFS

•  File system name space – how is distributed file system perceived on

clients? •  Cross-computer links

– how are files on other computers referred to?

CS 138 XV–29 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

NFS Mount Protocol

Server

Client

Approved List

CS 138 XV–30 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

File Handles

•  Servers provide opaque file handles to clients to refer to files

– contents mean nothing to clients –  identify files on server

•  Clients contact server via mount protocol to obtain file handles of roots of exported file systems

CS 138 XV–31 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

File Handle Contents

•  File-System ID – which server file system

•  File ID – which file within file system

•  Generation # – guards against inode reuse

File-System ID File ID Generation #

CS 138 XV–32 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Server File Systems

/

B A D E C

H I

Q P O N

U

3 2

T

1 Z

F G

M L K J

S

Y X

R

W V

CS 138 XV–33 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Client vs. Server Mount Points (1)

/

B A D E C

H I

Q P O N

U

3 2

T

1 Z

F G

M L K J

S

Y X

R

W V

/

C1 C2

mount server:/B /C2

CS 138 XV–34 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Client vs. Server Mount Points (2)

/

B A D E C

H I

Q P O N

U

3 2

T

1 Z

F G

M L K J

S

Y X

R

W V

/

C1 C2

mount server:/B /C2

mount server:/B/F/K /C2/F/K

CS 138 XV–35 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Local vs. Global Namespace

•  Local namespace – each host configures its own file-system

namespace – NFS clients each mount the appropriate remote

file systems •  Global namespace

– all hosts share the same namespace – not done in early NFS

CS 138 XV–36 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Mount Protocol Problems

•  Local namespaces don’t work •  Achieve global name space by having each

client mount everything consistently •  giving each client a table listing all possible

mounts is administratively difficult •  performing all possible mounts is time

consuming •  mounting is a “heavyweight” operation

CS 138 XV–37 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Rather than this …

home etc dev

carlos rohil rodrigo max louisa twd atty haris ishan

CS 138 XV–38 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

… this

home etc dev

Autofs

automount database

CS 138 XV–39 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Automounting: 2000

•  Maintain description of global namespace in global database: NIS

•  Do mounts only when needed •  Automount times out after period of unuse

CS 138 XV–40 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Automounting: 2017

•  Global namespace maintained in LDAP database

–  lightweight directory access protocol -  vendor neutral

– everything mounted at boottime -  fewer, but larger, file systems

– no timeout

CS 138 XV–41 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DCE’s DFS fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

Client

Client

Client

Client

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

CS 138 XV–42 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Mount Points fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

Client

Client

Client

Client

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

bin.sol7 bin.osf1 users.twd proj.motif proj.dce

root.cell

bin.sol7 bin.osf1 users.twd proj.motif proj.dce

CS 138 XV–43 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Strict Consistency in DFS fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

Client

Client

Client

Client

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

CS 138 XV–44 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (1) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client

File A: Read:0-4095

File B: Write:0-512

File A: Read:0-4095

File B: Write:513-4096

CS 138 XV–45 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (2) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client

File A: Read:0-4095

File A: Read:0-4095

Write(A, 0-4095)

CS 138 XV–46 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (3) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client

File A: Read:0-4095

File A: Read:0-4095

Revoke(A, Read, 0-4095)

CS 138 XV–47 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (4) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client File A: Read:0-4095

Grant(A, write, 0-4095)

File A: Write:0-4095

CS 138 XV–48 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (5) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client File A: Read:0-4095

File A: Write:0-4095

Read(A, 0-4095)

CS 138 XV–49 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (6) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client File A: Read:0-4095

File A: Write:0-4095

CS 138 XV–50 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Tokens (7) fs

system users projects

sol7 osf1

bin usr bin

bin

twd motif dce

Client

FLDB

FLDB

FLDB

Server

Server

Server

Server

Client File A: Read:0-4095

File A: Read:0-4095

Grant(A, read, 0-4095)

CS 138 XV–51 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Crash Recovery

Server

Client

Client

CS 138 XV–52 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Crash Recovery (notes continued)

Server

Client

Client

CS 138 XV–53 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS Recovery Problems

•  Client application must participate! – must recognize that operation returns “timed-

out” error – must retry

•  Due to semantics of tokens, it isn’t feasible to provide NFS-style hard mount

CS 138 XV–54 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

NFS Version 4

•  Better than … – NFS version 2 – NFS version 3 – CIFS – DFS –  (why aren’t we running it?)

CS 138 XV–55 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

NFSv4: Why?

•  Problems with NFSv3 – doesn’t provide exact Unix file semantics – doesn’t support mandatory locks – doesn’t cope with byzantine failures

CS 138 XV–56 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Server State

•  It’s required! – exact Unix semantics – mandatory locks

CS 138 XV–57 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

State Recovery

•  Server crash recovery – clients reclaim state on server

-  grace period after crash during which no new state may be established

•  Client crash recovery – server detects crash and nullifies client state

information on server

CS 138 XV–58 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Coping with Non-Responsiveness

•  Leases –  locks are granted for a fixed period of time

-  server-specified lease –  if lease not renewed before expiration, server

may (unilaterally) revoke locks and share reservations

-  most client RPCs renew leases – clients must contact server periodically

-  if clientid is rejected as stale, then server has restarted

-  server’s grace period is equal to lease period

CS 138 XV–59 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Pathological Network Problems

1)  Client 1 obtains a lock on a portion of a file 2)  There’s a network partition such that client 1

and server can no longer communicate 3)  The server crashes and restarts 4)  Client 2 obtains a lock on the same portion

of the same file, modifies the file, and then releases the lock

5)  The server crashes and restarts and the network partition is repaired

6)  Client 1 recontacts the server and reclaims its lock

CS 138 XV–60 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Coping …

•  Possibilities 1)  server keeps all client state in non-volatile

storage 2)  server keeps all client state in volatile storage

and refuses all reclaim requests (effectively emulating CIFS)

3)  something in between …

CS 138 XV–61 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

Compromise

•  Keep enough client state in non-volatile memory to know which clients were active at time of crash

– will honor reclaim requests from these clients – will refuse reclaim requests from others

•  What to keep: – client ID –  the time of the client’s first acquisition of a share

reservation or lock after a server reboot or client lease expiration

– a flag indicating whether the client’s most recent state was revoked because of a lease expiration

CS 138 XV–62 Copyright © 2017 Thomas W. Doeppner. All rights reserved.

DFS ≠ LFS

•  Servers might give up on non-crashed clients – clients may lose locks – clients may lose files – NFSv4 attempts to make such things unlikely