Upload
lyque
View
219
Download
1
Embed Size (px)
Citation preview
GPFS:GPFS: General Parallel File Systemy
11. 20. 2008
OutlineOutline
• What’s GPFS
• Main features of GPFS
• Parallelism & Consistency
• Fault Tolerance
• GPFS vs. GFS (Google File System)
• Summary
What is GPFS (1/2)What is GPFS (1/2)
A parallel, shared‐disk file system for cluster computers
Developed by IBM since 1993
A il bl AIX li d MS Wi d S 2003Available on AIX, linux and MS Windows Server 2003.
Used on 6 of the 10 most powerful supercomputers in the worldUsed on 6 of the 10 most powerful supercomputers in the worldExtreme scalability!!
What is GPFS (2/2)What is GPFS (2/2)
Extreme scalability from its shared‐disk architectureExtreme scalability from its shared‐disk architecture
Disk number: up to 4kDisk size: up to 1TBDisk size: up to 1TB
Switchingfabric
File system nodesShared disks
Main Features of GPFSMain Features of GPFS
High Performance: Multiple GB/s to/from a single fileuser data access – parallel !metadata access – parallel !administrative actions – parallel !
Parallelism & Consistency(come soon!)
High Availabilitylogging + replication + RAID support => Fault Tolerance
Good Scalability
OutlineOutline
What’s GPFS
Main features of GPFS
• Parallelism & Consistency
• Fault Tolerance
• GPFS vs. GFS (Google File System)
• Summary
Parallelism & ConsistencyParallelism & Consistency
Two approaches to preserve consistencyTwo approaches to preserve consistencyDistributed Locking
Centralized ManagementCentralized Management
Parallel data accessfile data: distributed byte‐range locking
file metadata: centralized management
allocation maps: distributed locking + centralized hint
Distributed Locking(1/3)Distributed Locking(1/3)
acquire r/w lock before every data operationacquire r/w lock before every data operation
dirty data & metadata flushed to disk before revoke
(1)acquire lock(3)Lock token on same object
Cluster node1(2)lock token/(5)revoke
token server Cluster node2
( ) /( )
(4) i l k
.
.
(4)acquire same lock
(6) lock token ..
Distributed Locking(2/3)Distributed Locking(2/3)
Problem: High bandwidth => multiple nodes write single file in parallelProblem: High bandwidth => multiple nodes write single file in parallel
Solution: lock byte ranges instead of whole files/blocks [Byte Range Locking]
Assuming node1, node 2, … come in order and start writing to the same file at the offset c1, c2, …
T1: node1 holds token for file range (0, infinity)
T2: node1: (0, c2) node2(c2, inf) assuming c1<c2( , ) ( , f) g
or node1: (c1, inf) node2(0, c1) assuming c1>c2
T3: node1: (0, c2) node2(c2, c3) node3(c3, inf)
……
Distributed Locking(3/3)Distributed Locking(3/3)
Byte Range Locking Distributed lock + hintsByte Range Locking
for user data
Distributed lock + hints
For allocation maps
Both scale well!
Centralized ManagementCentralized Management
All conflicting operations are forwarded to a designated node whichAll conflicting operations are forwarded to a designated node, which performs the requested read or update.
Used by access to file metadata. Why?
every write operation of metadata => lock conflict (heavy overhead…)
solution: using a metanode to collect/merge metadata updates
M t dMetanode
elected dynamically, one per open file
deleted dynamicallyy y
OutlineOutline
What’s GPFS
Main features of GPFS
Parallelism & Consistency
• Fault Tolerance
• GPFS vs. GFS (Google File System)
• Summary
Fault ToleranceFault Tolerance
Switchingfabric
File system nodesShared disks nodesShared disks
Fault ToleranceFault Tolerance
Node Failure
1) Logging & recovery2) Release the resources
Switchingfabric
hold by failed node
File system nodesShared disks nodesShared disks
Fault ToleranceFault Tolerance
Communication Failure
1) Result in isolated node or a network partition
2) Fence isolated node
Switchingfabric
2) Fence isolated node3) Log recovery
File system nodesShared disks nodesShared disks
Fault ToleranceFault Tolerance
Disk Failure
1) Use dual‐attached RAID controllers
2) Replication of data and
Switchingfabric
2) Replication of data and metadata blocks
File system nodesShared disks nodesShared disks
OutlineOutline
What’s GPFS
Main features of GPFS
Parallelism & Consistency
Fault Tolerance
• GPFS vs. GFS (Google File System)
• Summary
Google File System GPFS
Application Type Large, distributeddata‐intensive
Supercomputingotherdata intensive other…
Data access assumption Large streaming r/wMainly record appends
None
File Size assumption Usually huge None
Consistency Relaxed ‐
Synchronization Centralized Management Distributed locking +Synchronization Centralized Management Distributed locking +Centralized Management
Caching Not needed Needed
Data Unit Chunk (64MB) Block( typically 256KB)
Fault Tolerance Constant monitoringFast recovery
Logging & recoverySupport of RAIDFast recovery
ReplicationSupport of RAIDReplication
GPFS vs GFS ArchitectureGPFS vs. GFS ‐ ArchitectureGFS: Single Master + Multiply trunk server + Multiple clientGFS: Single Master + Multiply trunk server + Multiple client
GPFS:File system nodesManager nodes (could be any of the file system nodes)Storage nodes
GPFS vs GFSGPFS vs. GFS
Google File systemGoogle File system1) Using Single Master to simplify design2) Design decisions specific to its unique setting3) P i l tt ti t t f il3) Pay special attention to component failures4) Successfully met Google’s storage needs and is widely used
within Google as well as production data processing.
GPFS1) Using shared disk architecture for extreme scalability1) Using shared disk architecture for extreme scalability2) Combine academic ideas with new approaches to scale to
largest systems.3) Pay special attention to parallelism and scalability3) Pay special attention to parallelism and scalability4) Successfully satisfies the needs for throughout, storage capacity
and reliability of the largest and most demanding problems
SummarySummary
ll l h d d k f l f l• GPFS: A parallel, shared‐disk file system for cluster computers
• Main features:– High PerformanceHigh Performance
• Distributed Locking
• Centralized Management
O h ll l h l h ll i hi d d hi• Other parallel technology such as allocation hint and data ship
– High Availability
• logging + replication + RAID support => Fault Tolerance
– Good Scalability
ReferencesReferences
h // k d / k /• http://en.wikipedia.org/wiki/GPFS
• Frank Schmuck and Roger Haskin, GPFS: A Shared‐Disk File System for Large Computing Clusters, 2002g p g ,
• Sanjay Ghemawat, Howard Gobioff and Shun‐Tak Leung, The Google File System, 2003
dl hi l d i i h G l ll l il• Benny Mandler, Architectural and Design Issues in the General Parallel File System (Slice 2002)
• http://www‐03.ibm.com/systems/clusters/software/gpfs/p // / y / / /gp /