Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
November 18, 2003
Object Storage:Redefining Bandwidth for Linux Clusters
Brent Welch
Principal Architect, Panasas Inc.
Page 2Panasas
Blocks, Files and Objects
Block-base architecture: fast but private
Traditional SCSI and FC approaches
Expensive fabric, difficult to share between hosts
File-based architecture: sharable, but bottlenecked performance
NAS storage (NFS, CIFS, AFS and DFS)
Filer CPU and memory system between clients and disks
Object-based architecture: fast and sharable
Storage nodes directly accessible by clients via GbE
Out-of-band metadata servers make policy decisions for a file system
Storage nodes enforce access control to allow safe sharing
Page 3Panasas
Key Object Storage Advantages Key Object Storage Advantages
Robust, shared access by many clients
Scalable performance via an offloaded data path
Strong fine-grained end-to-end security
Object Storage System ArchitectureMoves low-level storage functions into the storage device itself
Key Object Storage Features Key Object Storage Features
Intelligent space management in storage layer
Media geometry aware placement
Late binding allocation
Data aware prefetching, caching & recovery
Encapsulation of data and attributes
Native object interface, good programming model
Storage interpreted attributes for per file properties
Page 4Panasas
What is an Object?
Object
Comprised of:User DataAttributes
Interface:ID <dev#,grp#,obj#>Read/WriteCreate/DeleteGetattr/SetattrCapability-based
File Component:Stripe files acrosscomponent objects
Page 5Panasas
Scalability: Capacity
Balanced storage node
CPU, SDRAM, GE NIC and 2 spindles
Commodity parts drive low cost
Drive linear performance gains
Simply add StorageBlades
Single Seamless Namespace!
Page 6Panasas
Scalability: Management
Single filesystem namespace
Removes physical & logical boundaries
Dynamic load-balancing
Interoperability
Gateway for NFS/CIFS
“Free” clustered NAS
Internal cluster management
Fault tolerance
Environmental/thermal monitoring
Software upgrades
Service and Support
Personalized extranet for bugs, SRs, orders
Single Global Namespace
Panasas ActiveScale ArchitecturePanasas ActiveScale Architecture
Eng. DevelopersEng. Developers Eng. QAEng. QA
MarketingMarketing
Page 7Panasas
Scalability: Metadata
Scaling
Block-level metadata controlled by Storage Blades (OSDs)
Client caching with callbacks to reduce load for file-level metadata
Clustered servers (Director Blades) with active/active failover
Metadata provides file system semantics over objects
Chunk ownership over collections of files and directories
For really large directories, hash into different collections
Store metadata with the objects on storage nodes
Page 8Panasas
Bandwidth
Sustained Throughput 60 seconds, N clients to N files
1 Client, 10 OSDs: 95 MB/s read, 77 MB/s write
10 Clients, 10 OSDs: 415 MB/s read, 335 MB/s write
151 Clients, 299 OSDs: 10334 MB/s read
Barrier synchronized 1 TB move (MPI IO “min” time)
151 Clients, 299 OSDs: N to N, 7486 MB/s read, 6506 MB/s write
151 Clients, 198 OSDs: 2775 MB/s concurrent write to one file
Clients are mostly 2.4 GHz uni-processors
Large tests had a mix, some duals, some faster
Page 9Panasas
GE Networking
GE NICs part of commodity-based storage and compute clusters
Cluster-specific interconnects optimized for bandwidth, latency
Storage interconnectsoptimized for cost,longevity
Cluster nodes getdevoted to beingI/O “routers”
Multiprotocol switch:bridge inside cluster switch
Eliminates I/O node,two switch ports
Page 10Panasas
Object Storage Acceptance
Los Alamos Labs buying up to620 TB through FY04
Business Objective
5X capability at 10% the cost oftoday’s system
Requirements
Linux commodity cluster
100+ Teraflops
Throughput GoalThroughput Goal: 1 GB/sec perTeraflop = 100 GB/Sec
Object storage testing at scale
120 TB Panasas storage installed
Option to buy up to 500 TB in FY04
Life Science
Gov’t Science
Oil and Gas
November 18, 2003
The premier storage system forThe premier storage system forscalable Linux clustersscalable Linux clusters
Page 12Panasas
Fine Grain Access Enforcement
State of art is VPN of all out-of-band clients, all sharable data and metadata
Accident prone & vulnerable to subverted client; analogy to single-address space computing
File Manager
Client Object Storage uses digitally
signed, object-specificcapabilities on each request
NASD
ReplyMAC = MACCapKey (Reply,NonceOut)
Secret Key
Secret Key
Private CommunicationStorageBlade Integrity/Privacy
1: Request for access2: CapArgs, CapKey
3: CapArgs, Req, Nonceln, ReqMAC
4: Reply, NonceOut, ReplyMAC
CapKey = MACSecretKey(CapArgs)CapArgs = ObjID, Version, Rights, Expiry,…
ReqMAC = MACCapKey(Req, Nonceln)
Page 13Panasas
Objects: Performance & Scalability
Breakthrough Data Throughput AND Random I/O
Random I/O Data Throughput
32-shelf, 600 spindles: 305,805 SFS ops/sec, 10 GB/sec
Page 14Panasas
Standardization Timeline
SNIA TWG is nearing completion of proposed OSD standard
Great participation by leading storage industry vendors
ANSI X3 T10 V1 standard should be in review – November ‘03
Next step for the OSD spec is under development
Roadmap includes SMIS support & Information Life Cycle management
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
CMU NASD Lustre
NSIC NASD Panasas OSDmarket
T10/SNIA OSD
Page 15Panasas
Ease of Management
ProblemProblem: Management is 80% of Storage TCO
Multiple physical & logical management sets
Ongoing adjustments to maintain efficiency
Security breaches
System backup, downtime and recovery
Single Namespace Dynamic Load Balancing Quality of Service
InitialPurchase
cost
OngoingManagement
costs
Panasas redefines Appliance-like simplicity
80%80%