15
November 18, 2003 Object Storage: Redefining Bandwidth for Linux Clusters Brent Welch Principal Architect, Panasas Inc.

Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

November 18, 2003

Object Storage:Redefining Bandwidth for Linux Clusters

Brent Welch

Principal Architect, Panasas Inc.

Page 2: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 2Panasas

Blocks, Files and Objects

Block-base architecture: fast but private

Traditional SCSI and FC approaches

Expensive fabric, difficult to share between hosts

File-based architecture: sharable, but bottlenecked performance

NAS storage (NFS, CIFS, AFS and DFS)

Filer CPU and memory system between clients and disks

Object-based architecture: fast and sharable

Storage nodes directly accessible by clients via GbE

Out-of-band metadata servers make policy decisions for a file system

Storage nodes enforce access control to allow safe sharing

Page 3: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 3Panasas

Key Object Storage Advantages Key Object Storage Advantages

Robust, shared access by many clients

Scalable performance via an offloaded data path

Strong fine-grained end-to-end security

Object Storage System ArchitectureMoves low-level storage functions into the storage device itself

Key Object Storage Features Key Object Storage Features

Intelligent space management in storage layer

Media geometry aware placement

Late binding allocation

Data aware prefetching, caching & recovery

Encapsulation of data and attributes

Native object interface, good programming model

Storage interpreted attributes for per file properties

Page 4: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 4Panasas

What is an Object?

Object

Comprised of:User DataAttributes

Interface:ID <dev#,grp#,obj#>Read/WriteCreate/DeleteGetattr/SetattrCapability-based

File Component:Stripe files acrosscomponent objects

Page 5: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 5Panasas

Scalability: Capacity

Balanced storage node

CPU, SDRAM, GE NIC and 2 spindles

Commodity parts drive low cost

Drive linear performance gains

Simply add StorageBlades

Single Seamless Namespace!

Page 6: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 6Panasas

Scalability: Management

Single filesystem namespace

Removes physical & logical boundaries

Dynamic load-balancing

Interoperability

Gateway for NFS/CIFS

“Free” clustered NAS

Internal cluster management

Fault tolerance

Environmental/thermal monitoring

Software upgrades

Service and Support

Personalized extranet for bugs, SRs, orders

Single Global Namespace

Panasas ActiveScale ArchitecturePanasas ActiveScale Architecture

Eng. DevelopersEng. Developers Eng. QAEng. QA

MarketingMarketing

Page 7: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 7Panasas

Scalability: Metadata

Scaling

Block-level metadata controlled by Storage Blades (OSDs)

Client caching with callbacks to reduce load for file-level metadata

Clustered servers (Director Blades) with active/active failover

Metadata provides file system semantics over objects

Chunk ownership over collections of files and directories

For really large directories, hash into different collections

Store metadata with the objects on storage nodes

Page 8: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 8Panasas

Bandwidth

Sustained Throughput 60 seconds, N clients to N files

1 Client, 10 OSDs: 95 MB/s read, 77 MB/s write

10 Clients, 10 OSDs: 415 MB/s read, 335 MB/s write

151 Clients, 299 OSDs: 10334 MB/s read

Barrier synchronized 1 TB move (MPI IO “min” time)

151 Clients, 299 OSDs: N to N, 7486 MB/s read, 6506 MB/s write

151 Clients, 198 OSDs: 2775 MB/s concurrent write to one file

Clients are mostly 2.4 GHz uni-processors

Large tests had a mix, some duals, some faster

Page 9: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 9Panasas

GE Networking

GE NICs part of commodity-based storage and compute clusters

Cluster-specific interconnects optimized for bandwidth, latency

Storage interconnectsoptimized for cost,longevity

Cluster nodes getdevoted to beingI/O “routers”

Multiprotocol switch:bridge inside cluster switch

Eliminates I/O node,two switch ports

Page 10: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 10Panasas

Object Storage Acceptance

Los Alamos Labs buying up to620 TB through FY04

Business Objective

5X capability at 10% the cost oftoday’s system

Requirements

Linux commodity cluster

100+ Teraflops

Throughput GoalThroughput Goal: 1 GB/sec perTeraflop = 100 GB/Sec

Object storage testing at scale

120 TB Panasas storage installed

Option to buy up to 500 TB in FY04

Life Science

Gov’t Science

Oil and Gas

Page 11: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

November 18, 2003

The premier storage system forThe premier storage system forscalable Linux clustersscalable Linux clusters

Page 12: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 12Panasas

Fine Grain Access Enforcement

State of art is VPN of all out-of-band clients, all sharable data and metadata

Accident prone & vulnerable to subverted client; analogy to single-address space computing

File Manager

Client Object Storage uses digitally

signed, object-specificcapabilities on each request

NASD

ReplyMAC = MACCapKey (Reply,NonceOut)

Secret Key

Secret Key

Private CommunicationStorageBlade Integrity/Privacy

1: Request for access2: CapArgs, CapKey

3: CapArgs, Req, Nonceln, ReqMAC

4: Reply, NonceOut, ReplyMAC

CapKey = MACSecretKey(CapArgs)CapArgs = ObjID, Version, Rights, Expiry,…

ReqMAC = MACCapKey(Req, Nonceln)

Page 13: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 13Panasas

Objects: Performance & Scalability

Breakthrough Data Throughput AND Random I/O

Random I/O Data Throughput

32-shelf, 600 spindles: 305,805 SFS ops/sec, 10 GB/sec

Page 14: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 14Panasas

Standardization Timeline

SNIA TWG is nearing completion of proposed OSD standard

Great participation by leading storage industry vendors

ANSI X3 T10 V1 standard should be in review – November ‘03

Next step for the OSD spec is under development

Roadmap includes SMIS support & Information Life Cycle management

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

CMU NASD Lustre

NSIC NASD Panasas OSDmarket

T10/SNIA OSD

Page 15: Object Storage - DTC · State of art is VPN of all out-of-band clients, all sharable data and metadata Accident prone & vulnerable to subverted client; analogy to single-address space

Page 15Panasas

Ease of Management

ProblemProblem: Management is 80% of Storage TCO

Multiple physical & logical management sets

Ongoing adjustments to maintain efficiency

Security breaches

System backup, downtime and recovery

Single Namespace Dynamic Load Balancing Quality of Service

InitialPurchase

cost

OngoingManagement

costs

Panasas redefines Appliance-like simplicity

80%80%