GPFS: General Parallel File System - University of Rochestercs.rochester.edu/~sandhya/csc256/seminars/xiaoya_GPFS.pdf · What is GPFS (1/2) ¾A parallel, shared‐disk file system

GPFS:GPFS: General Parallel File Systemy

11. 20. 2008

OutlineOutline

• What’s GPFS

• Main features of GPFS

• Parallelism & Consistency

• Fault Tolerance

• GPFS vs. GFS (Google File System)

• Summary

What is GPFS (1/2)What is GPFS (1/2)

A parallel, shared‐disk file system for cluster computers

Developed by IBM since 1993

A il bl AIX li d MS Wi d S 2003Available on AIX, linux and MS Windows Server 2003.

Used on 6 of the 10 most powerful supercomputers in the worldUsed on 6 of the 10 most powerful supercomputers in the worldExtreme scalability!!

What is GPFS (2/2)What is GPFS (2/2)

Extreme scalability from its shared‐disk architectureExtreme scalability from its shared‐disk architecture

Disk number: up to 4kDisk size: up to 1TBDisk size: up to 1TB

Switchingfabric

File system nodesShared disks

Main Features of GPFSMain Features of GPFS

High Performance: Multiple GB/s to/from a single fileuser data access – parallel !metadata access – parallel !administrative actions – parallel !

Parallelism & Consistency(come soon!)

High Availabilitylogging + replication + RAID support => Fault Tolerance

Good Scalability

OutlineOutline

What’s GPFS

Main features of GPFS

• Parallelism & Consistency

• Fault Tolerance


• Summary

Parallelism & ConsistencyParallelism & Consistency

Two approaches to preserve consistencyTwo approaches to preserve consistencyDistributed Locking

Centralized ManagementCentralized Management

Parallel data accessfile data: distributed byte‐range locking

file metadata: centralized management

allocation maps: distributed locking + centralized hint

Distributed Locking(1/3)Distributed Locking(1/3)

acquire r/w lock before every data operationacquire r/w lock before every data operation

dirty data & metadata flushed to disk before revoke

(1)acquire lock(3)Lock token on same object

Cluster node1(2)lock token/(5)revoke

token server Cluster node2

( ) /( )

(4) i l k

.

.

(4)acquire same lock

(6) lock token ..


Problem: High bandwidth => multiple nodes write single file in parallelProblem: High bandwidth => multiple nodes write single file in parallel

Solution: lock byte ranges instead of whole files/blocks [Byte Range Locking]

Assuming node1, node 2, … come in order and start writing to the same file at the offset c1, c2, …

T1: node1 holds token for file range (0, infinity)

T2: node1: (0, c2) node2(c2, inf) assuming c1<c2( , ) ( , f) g

or node1: (c1, inf) node2(0, c1) assuming c1>c2

T3: node1: (0, c2) node2(c2, c3) node3(c3, inf)

……


Byte Range Locking Distributed lock + hintsByte Range Locking

for user data

Distributed lock + hints

For allocation maps

Both scale well!

Centralized ManagementCentralized Management

All conflicting operations are forwarded to a designated node whichAll conflicting operations are forwarded to a designated node, which performs the requested read or update.

Used by access to file metadata. Why?

every write operation of metadata => lock conflict (heavy overhead…)

solution: using a metanode to collect/merge metadata updates

M t dMetanode

elected dynamically, one per open file

deleted dynamicallyy y

OutlineOutline

What’s GPFS


Parallelism & Consistency

• Fault Tolerance


• Summary

Fault ToleranceFault Tolerance

Switchingfabric

File system nodesShared disks nodesShared disks


Node Failure

1) Logging & recovery2) Release the resources

Switchingfabric

hold by failed node



Communication Failure

1) Result in isolated node or a network partition

2) Fence isolated node

Switchingfabric

2) Fence isolated node3) Log recovery



Disk Failure

1) Use dual‐attached RAID controllers

2) Replication of data and

Switchingfabric

2) Replication of data and metadata blocks


OutlineOutline

What’s GPFS


Parallelism & Consistency

Fault Tolerance


• Summary

Google File System GPFS

Application Type Large, distributeddata‐intensive

Supercomputingotherdata intensive other…

Data access assumption Large streaming r/wMainly record appends

None

File Size assumption Usually huge None

Consistency Relaxed ‐

Synchronization Centralized Management Distributed locking +Synchronization Centralized Management Distributed locking +Centralized Management

Caching Not needed Needed

Data Unit Chunk (64MB) Block( typically 256KB)

Fault Tolerance Constant monitoringFast recovery

Logging & recoverySupport of RAIDFast recovery

ReplicationSupport of RAIDReplication

GPFS vs GFS ArchitectureGPFS vs. GFS ‐ ArchitectureGFS: Single Master + Multiply trunk server + Multiple clientGFS: Single Master + Multiply trunk server + Multiple client

GPFS:File system nodesManager nodes (could be any of the file system nodes)Storage nodes

GPFS vs GFSGPFS vs. GFS

Google File systemGoogle File system1) Using Single Master to simplify design2) Design decisions specific to its unique setting3) P i l tt ti t t f il3) Pay special attention to component failures4) Successfully met Google’s storage needs and is widely used

within Google as well as production data processing.

GPFS1) Using shared disk architecture for extreme scalability1) Using shared disk architecture for extreme scalability2) Combine academic ideas with new approaches to scale to

largest systems.3) Pay special attention to parallelism and scalability3) Pay special attention to parallelism and scalability4) Successfully satisfies the needs for throughout, storage capacity

and reliability of the largest and most demanding problems

SummarySummary

ll l h d d k f l f l• GPFS: A parallel, shared‐disk file system for cluster computers

• Main features:– High PerformanceHigh Performance

• Distributed Locking

• Centralized Management

O h ll l h l h ll i hi d d hi• Other parallel technology such as allocation hint and data ship

– High Availability

• logging + replication + RAID support => Fault Tolerance

– Good Scalability

ReferencesReferences

h // k d / k /• http://en.wikipedia.org/wiki/GPFS

• Frank Schmuck and Roger Haskin, GPFS: A Shared‐Disk File System for Large Computing Clusters, 2002g p g ,

• Sanjay Ghemawat, Howard Gobioff and Shun‐Tak Leung, The Google File System, 2003

dl hi l d i i h G l ll l il• Benny Mandler, Architectural and Design Issues in the General Parallel File System (Slice 2002)

• http://www‐03.ibm.com/systems/clusters/software/gpfs/p // / y / / /gp /

Documents

GPFS: General Parallel File System - University of Rochestercs.rochester.edu/~sandhya/csc256/seminars/xiaoya_GPFS.pdf · What is GPFS (1/2) ¾A parallel, shared‐disk file system