20
ddn .com ©2013 DataDirect Networks. All Rights Reserved. A Beginner’s Guide To Next Generation Object Storage DDN | Whitepaper Tom Leyden, Director of Product Marketing WOS

BeginnersGuideToObjectStorage_whitepaper

Embed Size (px)

DESCRIPTION

Basics about Object based storage

Citation preview

Page 1: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved.

A Beginner’s Guide To Next Generation Object StorageDDN | Whitepaper

Tom Leyden, Director of Product Marketing WOS

Page 2: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 2

Executive SummaryObject Storage is the new storage paradigm. There is a high level of interest from organizations, as this new

approach resolves the challenges of effi ciently storing massive volumes of unstructured data - Big Unstructured

Data. This paper addresses the why, what and how of object storage.

Why should companies use Object Storage for unstructured data and how is it diff erent from NAS or SAN?

The biggest problem with traditional approaches is scalability. NAS lacks the ability to scale as a single system,

especially in Petabyte environments. Today’s SANs are already complex, when deployed with a fi le system layer on

top. Scaling-out makes the problem a lot worse.

Object Storage is essentially just a diff erent way

of storing, organizing and accessing data on disk.

An Object Storage platform provides a storage

infrastructure to store fi les with lots of metadata

added to them – referred to as objects. The backend

architecture of an object storage platform is

designed to present all the storage nodes as one

single pool. With Object Storage, there is no fi le

system hierarchy. The architecture of the platform,

and its new data protection schemes (vs. RAID, the

de-facto data protection scheme for SAN and NAS),

allow this pool to scale virtually to an unlimited size,

while keeping the system simple to manage.

Users access object storage through applications that typically use a REST API (an internet protocol, optimized for

online applications). This makes object storage ideal for all online, cloud, environments. When objects are stored, an

identifi er is created to locate the object in the pool. Applications can very quickly retrieve the right data for the users

through the object identifi er or by querying the metadata (information about the objects, like the name, when it

was created, by who etc.). This approach enables signifi cantly faster access and much less overhead than locating a

fi le through a traditional fi le system.

DDN | WOS is a true object storage platform, designed to scale beyond petabytes as a single system, optimizing TCO

without compromising performance or durability. This makes WOS a perfect platform for a variety of storage cloud

solutions, including online collaboration, active archives, cloud backup and worldwide data distribution.

Figure 1 Object Storage Architecture

Page 3: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 3

Table of Contents

Executive Summary 2History of Object Storage 4 SAN vs NAS 5 Object Storage, The Third Paradigm 6 Cloud Storage, Storage Clouds, Object Storage 7 REST API’s 7 Object Storage Summary 8Why Object Storage? 8 Massive Data Growth 8 Always Online 8 Power to the Applications 9 The Big Data Explosion 9 We All Use Object Storage Everyday 10 Use Cases 10How Does Object Storage Work? 11 Issues with File Storage 11 Data Protection: Erasure Coding or Not? 14WOS 15 True Object Storage Platform 16 Optimized for Small and Large Files 16 Choice of Data Protection Schemes 16 Self-healing Architecture 16 Single Storage Infrastructure 16 Widest Selection of Interfaces; Out of the Box Applications 16 Enterprise-grade Platform 17 WOS Benefits 18Ecosystem 18Resources 20

Page 4: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 4

History of Object Storage Object Storage is not a new concept, it even predates the 2006 launch of Amazon’s S3®. Much advancement

have been made, and today’s current generation of Object Storage platforms cannot be compared to the earlier

generations - which were merely black boxes designed to store immutable copies of documents, mostly for

compliance environments. EMC® Centera®, based on Content Addressed Storage (CAS) innovator Filepool, was one

such early implementation of an object-based construct. Today, Centera users face a big challenge to move to newer,

faster storage infrastructures: Centera provides no interfaces to migrate data to other platforms.

The current generation of object storage platforms is designed with this “openness” & fl exibility in mind. Most

platforms support a subset of Amazon’s REST API and some platforms are designed to be independent of the

hardware platform. The industry has learned some tough lessons from using proprietary systems. One initiative

to prevent Vendor Lock-in, is SNIA’s Cloud Data Management Interface (CDMI). This is a set of pre-defi ned RESTful

HTTP operations “for assessing the capabilities of the cloud storage system, allocating and accessing containers

and objects, managing users and groups, implementing access control, attaching metadata, billing, moving data

between cloud systems, exporting data, etc.”1

What is Object Storage? Object Storage is essentially just a diff erent way of storing, organizing and accessing data on disk. To really

understand how object storage is diff erent from traditional storage platforms, it is important to understand the

“what and how” of traditional storage, and what the challenges are2.

Figure 2 Object Storage Timeline

1From Wikipedia: http://en.wikipedia.org/wiki/Cloud_Data_Management_Interface

2For the sake of briefness, we will stick to the very basics. There are hundreds, if not thousands of blog articles and papers about this topic available online.

Page 5: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 5

SAN vs NASA SAN is block storage device, not that diff erent from an external USB disk drive, just bigger. Systems connect to a

SAN with a block interface; common protocols for block storage include iSCSI, Fibre Channel, Fibre Channel over

Ethernet (FCoE), etc. A device attaching to SAN will see the storage presented as a disk drive. SANs allow multiple

servers to share a pool of storage that cannot be accessed by individual users. This is to prevent the overwriting each

other’s data. SANs are typically used by large applications, such as enterprise databases, that handle data locking

through the application. SAN storage can be presented as a fi le system (by putting a fi le system layer on top), which

is generally referred to as a clustered fi le system. As we will explain later in this document, SANs are complex systems

to manage, especially when used for fi le storage.

A NAS is a fi le storage device. NAS exposes its storage as a network fi le system. Devices that attach to a NAS see a

mountable fi le system. Common protocols for fi le storage devices include, NFS and SMB / CIFS. A NAS operates at

the fi le level and is accessible to users with proper access rights - so it needs to manage user privileges, fi le locking

and other security measures. A NAS environment is a much better fi t than SANs for to store fi les.

Figure 3 Simpli� ed SAN infrastructure with Clustered � le system and enterprise applications

Page 6: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 6

Object Storage, The Third Paradigm

So, if NAS & SAN can store fi les, then, why Object Storage? How is it diff erent? As we will explain in greater detail

to follow, the biggest problem with both systems is scalability. NAS cannot scale as a single system in petabyte-

size environments. To scale-out a NAS environment requires a combination of multiple systems (management!) or

forklift upgrades, with labor-intensive data migration projects. As we mentioned before, SANs are pretty complex

when deployed with a fi le system layer on top. Scaling-out makes the problem a lot worse. Again, with lots of

management!

Also, most of the unstructured data that is stored

online (or in active archives) is immutable data –

meaning the fi le will not be modifi ed. Much of

the functionality built-in to traditional fi le systems

addresses user access rights and permissions for

appending and amending fi les. These complex

functions create a lot of overhead in terms of

performance, IOs required to access data and the

ability to scale. Object Storage does not have this

functionality. If a user modifi es a fi le, the new version

is simply stored as a new object. This results in a much

simpler architecture than traditional fi le systems have.

An Object Storage platform is a storage infrastructure

to store … objects. For now we will refer to objects

as similar to fi les (collection of data blocks with

metadata), later in this document we will explain how

this is actually only partly true. The backend architecture of an object storage platform is designed so that all the

storage nodes are presented as one single pool. There is no fi le system hierarchy. The architecture of the platform,

and new data protection schemes (vs. RAID, the de-facto data protection scheme for SAN and NAS) allow this pool

to scale to virtually unlimited capacities, while keeping the system simple to manage.

Users access object storage through applications that will typically use a REST API. They use a set of simple

commands: GET (read), PUT (save) and DELETE. REST is an internet protocol, optimized for online applications. This

makes object storage ideal for all online, Cloud, environments. When objects are stored, an identifi er is created to

locate the object in the pool. Applications can very quickly retrieve the right data for the users through the object

identifi er - or by querying the metadata (information about objects: name, when it was created, by who, etc.). This

is much faster than attempting to locate a fi le through a traditional fi le system. Applications also handle user access

management. Each time a fi le (object) is changed, it is stored as a new object. This prevents corruption through

simultaneous changes.

Figure 4 NAS Storage is presented as a � le system to the clients

Page 7: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 7

Cloud Storage, Storage Clouds, Object StorageWhat is the diff erence between Cloud Storage and Storage Clouds? How does Object Storage fi t? The answer is

pretty straightforward. Cloud Storage is the storage used for Compute Cloud infrastructures - in other words: to run

VM’s on. Compute Clouds are very IOPS intensive and usually block storage is used in these applications. Storage

Clouds are “storage in the cloud”, whether public or private. So, Storage Clouds are simply storage capacity that is

made available through the Internet. Most of today’s storage clouds use object storage technologies.

REST API’sREST stands for Representational State Transfer. It is a software architecture that is used for distributed application

environments, such as the internet. An API, short for Application Programming Interface, is an interface used for an

application (client) to talk to its environment (backend servers, storage, databases etc.). With the success of cloud-

style computing (running applications in the cloud, rather than on the user’s computer), REST API’s have become the

predominant interface for cloud applications to connect to the cloud. For storage-centric cloud applications, a REST

API is the interface between the application and the object storage platform.

Figure 5 Scale out object storage with simple REST API for applications

Page 8: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 8

The three most common commands in REST API’s for object storage environments are GET, PUT and DELETE, which

are the equivalents of reading a file, saving a file (technically “save as” – because object storage does not allow you to

update an object), or deleting a file.

Since the early days of Cloud Computing, there’s been a lot of discussion about standardizing on a specific REST API

to avoid vendor lock-in. The general idea behind this is, if all vendors (of applications, cloud infrastructures, object

storage platforms etc.) use a standard API, users will never be locked-in to a specific environment. Without having to

reprogram their applications, they would be able to freely move their data from one platform to another - or keep

it on more than one platform. Little progress has been made on the standardization front however, and the result is

that object storage platforms will either support the Amazon S3 API, the OpenStack API or a native API (i.e. an API of

their own, typically a very easy to use, lightweight interface).

Object Storage Summary

• Data is stored as objects in one large, scalable pool of storage

• Objects are stored with metadata – information about the object

• An Object ID is stored, to locate the data

• REST is the standard interface, simple commands used by applications

• Objects are immutable; edits are saved as a new object

Why Object Storage? Massive Data Growth Depending on which analyst firm you talk to, you will hear storage growth predictions that vary between 30x-40x for

the next decade. That means we will all be storing 30 to 40 times as much digital data ten years from now, compared

to today. At the same time, companies will only invest an additional 50% in personnel to manage their storage

infrastructures. This means that the average storage operator will have to manage 15-20 times as much storage a

decade from now. This will drive the need for storage platforms that require little management effort and scale out

to virtually unlimited capacities.

Always Online Much of that data growth is driven by the recent innovations in cloud and mobile computing. We already mentioned

Amazon S3, but there are also Google®, Facebook®, Apple® and several smaller public storage cloud offerings that set

a new level of expectations where all data needs to be available anywhere at anytime.

Page 9: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 9

Power to the ApplicationsFile-based storage platforms not only fail to scale suffi ciently, they also become obsolete as more and more

applications are designed to use REST API’s (the default interface for object storage platforms) to talk directly to

the storage, without additional (fi le system) layers in between. This greatly simplifi es architectures and delivers

signifi cant performance gains.

The Big Data ExplosionEssentially, there are three types of Big Data: Big Structured Data, Big Semi-structured Data (Big Data Analytics) and

Big Unstructured Data. All three require one or more of the “three V’s”, the commonly accepted defi nition of Big Data:

“Big Data refers to any set of data that comes in great Volumes, has a large Variety of information and/or is consumed

at high Velocity.

Big Structured Data refers to large enterprise databases. Velocity is key here, hence the success of the superfast

SSD drives. Big Semi-structured Data refers to massive volumes of small log fi les (often sensor information), that is

collected for analytics. Therefore, we also talk about Big Data Analytics. This data is stored in distributed frameworks

that support distributed processing. Think of Hadoop® and MapReduce. Object Storage is not commonly used for

structured or semi-structured Big Data, unless it is for archival purposes.

The sweet spot for object storage is Big Unstructured Data, which refers to all data that users best understand

as fi les. Think of image and movie data – always growing and always in higher resolution – music fi les, offi ce

documents etc. Analysts believe that 80% or more of the expected data growth will be unstructured data. File based

storage platforms cannot support this growth. This is the problem Object Storage solves.

Page 10: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 10

We All Use Object Storage Everyday Possibly the largest object storage infrastructure, and one of the drivers for the adoption of object storage is

Amazon’s S3. This “public” storage cloud service was launched in 2006 and has stimulated many application

developers to have their applications use S3 as backend storage. The benefits were clear: no hassle with a private

infrastructure, relatively low cost, pay as you go, scale as needed and a very simple interface.

However, while Amazon advertises very low cost to store data on their infrastructure, there are some hidden costs

such as network traffic. At a certain volume of data, there is a point of cost inflection. Many of the startups that

launched on Amazon over the past years and who clearly see the benefits of object storage, are now deploying their

own infrastructure using object storage platforms.

Also, not everyone wants their data in a public environment with debatable SLA’s and moderate security at best.

More and more enterprises are choosing to deploy their own internal storage clouds to facilitate cloud-based

applications. These infrastructures need similar or better availability and durability than the available public services.

Use Cases Object Storage is more than a smarter paradigm that allows you to store large volumes of unstructured data.

Features like massive scalability, REST APIs, geographic distribution, enable a series of compelling use cases. An

interesting side effect is that solutions tend to overlap. Dropbox® is not just file sharing, it’s backup, collaboration,

archiving and mobile storage. Here are a few popular use cases:

Online Web Services: As we mentioned earlier, one of the drivers for object storage is the trend to use more

and more online cloud applications. Previously, without Amazon’s S3, none of this would have been possible.

The more successful web services companies are now gradually making the move to in-house infrastructures.

Also, with corporate security policies, IP and compliance considerations, most enterprises prefer to run cloud

applications on private storage infrastructures.

File Sharing is by far the most popular object storage use case. Dropbox offered a solution for a need that

most of us did not know we had. Today, service providers are now deploying similar services and enterprises are

deploying private file sharing services – as people utilize a variety of devices at home, at work and on-the-go.

They collaborate with people across the office or around the world.

Cloud Backup is increasingly popular. There are dozens and dozens of online services for backup. For

enterprises, the idea of backing up to low cost, highly scalable disk infrastructures - rather than tape, which can

be cumbersome for recovery - is also very compelling

Page 11: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 11

Cloud Archives: Data archiving decisions used to be very simple: data that was infrequently accessed was

moved off disk to tape. Very few arguments could beat the low TCO of tape. Disk archives were hard to justify

and reserved for those exceptional use cases where latency outweighed the huge cost of disk archives. With

object storage, it is now possible to deploy disk archives at an acquisition cost and TCO close to that of tape.

Many organizations are opting for hybrid environments - with a really, really superfast “hot” disk tier and a very

cheap “cold” tape tier.

Worldwide Collaboration: Globally distributed teams have become standard practice. Think of researchers

from different institutions working on the same project. Think of a movie being shot in New Zealand and

produced in Los Angeles - or software being developed in California and then tested in India. Geographically

distributed storage pools enable teams to work in real-time on the same datasets.

How Does Object Storage Work? Issues with File Storage As we explained earlier in this document, file based storage is a great concept: users can access the same resources

through a corporate network. The file system takes care of permissions, access rights and avoids users overwriting

each other’s data. File systems can even present data in a hierarchical “Directory” structure, which until now has been

a very useful tool to keep data organized. The underlying software for such file systems contains a lot of “ingenuity”,

which rapidly becomes “complexity” when scaling-out the infrastructure.

The concept of a “file” on a computer system is so well ingrained that it is often difficult to think of computer storage

in different terms. It is clearly a very powerful and natural way to think of data. Object Storage is distinct from “file

storage”, but in some ways is even a more natural and powerful way to organize data.

The “file” concept is an abstraction. In actuality, data in a computer system is stored in fixed size “blocks” which are

“addressed” with a number – which ultimately is a physical location in a storage device. This is the case for data

stored in a NAS, a SAN or when using Object Storage. The system presents those blocks to the user or application in

a form that is useful. For non-transactional data, that form is usually a file.

File storage systems will also store a small amount of information that tells which of those data blocks make up

the file, in which order, and finally what “name” has been applied to the collection of blocks. This additional “data

about data” is called “file system metadata”. Keeping track of the file system metadata is the responsibility of the “file

system”. To keep even more than a few files organized, a file system imposes a hierarchy on the metadata in the form

of “directory structure”. A key concept of the file system is the notion that the files themselves have relationships to

one another – as one could think of files being “co- located” in a directory.

Page 12: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 12

When the system is instructed to “read a file”, the repository of file system metadata is consulted and the required

data blocks are retrieved from the storage device. Writing data into a file system has the additional complexity

of requiring that the file system metadata must be written or updated - potentially by several users or processes

simultaneously. Numerous techniques and designs exist that attempt to minimize the impact of dealing with file

system metadata, and the “locking” problem associated with simultaneous access. Unfortunately, as the number

of files in the system grows large, keeping the file system metadata correctly organized (so that the names and the

data blocks that make up files can be found) becomes increasingly complex. As this requirement increases, keeping

track of billions of “files” (which may be distributed across a number of network connected computer systems),

the abstraction of the “file system” begins to breakdown. Moreover, the hierarchical structure of the “file system” is

insufficient to adequately categorize the data in the system.

File systems require at least three layers of software constructs to execute any file operation. As they allow files to

be amended by multiple users, they must maintain complex lock structures with OPEN and CLOSE semantics. These

lock structures must be distributed coherently to all of the servers used for access.

Also, as data is placed (based on random block availability), traditional file systems are always fragmented. This

is especially true in environments where the data is unstructured and it is not uncommon to write widely varied

file sizes. Using a traditional file system designed for amendable data, storing immutable data constitutes an

inappropriate and wasteful use of bandwidth and compute resources. This highly inefficient approach requires

a great deal of additional hardware and network resources to achieve data distribution goals. These systems now

become exponentially more complex as they are scaled-out.

Object storage systems dispense with the overburdened concept of file system metadata. This approach allows the

system to separate the storage of data from the relationship that the individual data items have to each other. In an

object storage system, the physical storage blocks are organized into “objects” which are collections of data blocks

represented by an identifier. There is no “hierarchy” imposed on the data and no repository of the objects’ metadata

to be consulted when reads or writes are requested. This approach allows an object storage system to scale with

both the requirements and size of the system, well beyond the technical & practical boundaries of traditional file

systems.

While Object Storage systems do not use file system metadata, they do employ object metadata (customizable

information about the objects). This information can later be used to query or analyze the information stored. Object

metadata for a photo could be the day it was taken, the last time it was modified, the type of camera that was used,

whether a flash was used, where it was taken, etc. Object metadata will play an increasingly important role as we

store more and more information, but it does not add complexity to the system like file system metadata does.

Page 13: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 13

• At the highest level, storage servers are, like NAS and SAN, simply boxes with a lot of disks in there. Typically, object storage vendors will use SATA disks in their systems, and may include SSDs for caching. Some platforms opt for separate controllers, but in essence that does not make a diff erence, as the storage is presented as one pool (namespace). When choosing an object storage platform, it’s important to understand the limitations of the namespace and how the system combines diff erent pools or namespaces. Many vendors claim infi nite scalability, but there is no such thing. The important thing is to understand how namespaces are combined, presented and managed. How many such namespaces can be combined? Are they managed as one system? The system software manages most of that.

• The actual software layer is where vendors can diff erentiate. The list of possible features is endless. A single management interface is always great. Self-healing capabilities are a must for environments that will scale into the hundreds of petabytes. The software layer also provides data protection mechanisms, which we will cover in the next section.An Object ID is stored, to locate the data

• The standard interface to access data in an object storage platform is a RESTful interface or REST API. This is a set of simple commands that application developers use in their code to let the application access the data. The basic REST commands are LIST, GET, PUT and DELETE, which are used to list (a selection of ) objects, read an object, store an object or delete it. There is no standard for REST yet, but the so called Amazon API is by far the most popular amongst developers. Hence, most object storage providers will provide an “Amazon compatible API”, which is typically a subset of the commands that are supported by Amazon S3. As most legacy applications were designed to interface with a fi le system, most object storage platforms will also provide one or more fi le interfaces (a fi le system layer on top of the object storage pool – also called a fi le system gateway) and often a selection of programming language-specifi c API’s will be provided as well. DDN’s WOS has the widest selection of interfaces on the market.

Page 14: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 14

Data Protection: Erasure Coding or Not? Several data protection mechanisms are used for object storage. The one method that is being abandoned, however,

is RAID (which has been the de facto data protection scheme for SAN and NAS for the past two decades). The

problem with RAID is that it was originally designed for small capacity disks . The larger the drive capacity, the longer

it takes to restore a failed drive. During this restore, the data is less protected. If you are on RAID 5, and a second disk

fails in your RAID group during the lengthy restore, then data loss will occur. Also, as all processing capacity is used

for the restore, users will experience severe performance drops as data is being written to the replacement disk.

Large systems with hundreds of TBs or Petabytes, will routinely be in constant rebuild mode, as drives routinely fail.

In an effort to shorten these longer rebuild cycles, RAID systems ship with faster processors, which also consume

more energy, but this only masks the problem at best.

The simplest way to protect data is to make several copies, replication. A popular concept is called “three copies

in the cloud”, promoted mostly by public cloud platforms like Amazon S3 and Rackspace®. While three copies in

the cloud provides acceptable data protection, it is also very lucrative for the cloud provider as they are in the

business of selling more storage capacity. Swift™, the object storage component of Rackspace’s open source cloud

infrastructure also uses pure replication.

A more efficient data protection mechanism is erasure coding. Today, there are several flavors of erasure coding,

each one with its own benefits. Erasure coding’s key advantage is that you can break up your data into n fragments,

add m additional fragments, store the fragments across n+m devices, and then recover the original data from any n

of the devices. Survive 4 failures? 10? Pick a number! Also when a disk is lost, the system only has to calculate new

fragments, to be spread over any selection of disks with available capacity. This is a lot faster and more efficient than

restoring an entire RAID-based disk, even if it was only 20% full.

Erasure coding can be implemented locally or distributed, which means that fragments are spread over multiple

data centers (at least three) and the system can survive failure of a full datacenter. Distributed erasure coding

drastically reduces the overhead (the extra storage that is needed to protect the data). Five 9’s or more are

guaranteed with overhead numbers as low as 20% - as opposed to 3 copies requiring 200% overhead. The problem

with distributed erasure coding is that it creates a huge WAN cost, as rebuilds require data transfer between the data

centers. Also, the availability highly depends on the WAN connectivity. Each data read from a distributed erasure

coding pool, requires data to be read from three data centers. If any of these connections have high latency, the user

will notice the delayed response.

An interesting, “best of both worlds” solution is local erasure coding with replication. Such architecture combines

the benefits of erasure coding with those of replication (a full copy is present in each datacenter). While such a setup

requires more overhead than the distributed alternative, the TCO is typically a lot lower due to the reduced WAN

traffic.

Page 15: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 15

WOSDDN’s legacy is designing high-performance storage systems, but without making things more complex than they

need to be. WOS is the perfect example of achieving operational excellence through reverse engineering - stripping

the architecture down to the very basics. The architecture of WOS consists of three components: WOS building

blocks, WOS Core software and a choice of simple interfaces.

• The backend of a WOS storage infrastructure are the WOS storage nodes. The storage nodes are essentially 4U servers fi lled with 60 SATA disks. A WOS infrastructure can contain as few as 3 nodes and scales to virtually unlimited capacity by adding more nodes.

• Smart storage requires intelligent software. WOS Core has

a single, straightforward management console for the entire

infrastructure - even when distributed across multiple sites. WOS

Core’s self-healing capabilities and other features drastically

reduce operator-driven maintenance interventions.

• WOS provides the most complete choice of interfaces, including

a set of native API’s, fi le access interfaces and S3 REST.

Page 16: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 16

True Object Storage Platform Most object storage platforms still have a POSIX file system layer on the disk level. WOS, however, was designed as a

true object storage platform, a flat, single layer, address structure where objects are stored in a contiguous group of

blocks so that disk operations are minimized (single-disk-operation reads – dual-operation writes), performance is

maximized and disks are used at full capacity.

Optimized for Small and Large Files WOS is the only object storage system that is optimized for high-speed throughput of large data volumes and super-

fast I/O operations for small files. For multi-site deployments, the built-in WOS Latency-Aware Access Manager will

automatically address data access requests to the location with the lowest latency.

Choice of Data Protection Schemes WOS offers a choice of data protection mechanisms to ensure the highest data durability AND availability. Reduce

your storage overhead while maximizing durability for single site deployments with local ObjectAssure™, DDN’s

implementation of Erasure Coding. Alternatively, you can choose Replicated ObjectAssure, to improve availability

without increasing WAN costs. Finally, ObjectAssure™ can be implemented in a distributed way to ensure higher

durability, at a lower cost.

Self-healing Architecture

Keeping traditional storage infrastructures healthy is management-intensive. Disks need to be replaced and

restored. Rebuild windows need to be kept to a minimum to avoid data loss and preserve application performance.

This is not the case with WOS. The built-in data protection algorithm, ObjectAssure has unique self-healing

capabilities that further reduce the management effort. Also, in case of a broken disk, ObjectAssure only has to

reconstruct the actual data that was lost - as opposed to the entire disk. This dramatically reduces the rebuild

window.

Single Storage Infrastructure

WOS is the only object storage platform that seamlessly integrates with other storage tiers. It has one management

interface for the entire infrastructure and supports easy data movement between different tiers, e.g. from GPFS to

WOS and back, or from and to Lustre environments.

Widest Selection of Interfaces; Out of the Box Applications WOS provides the most complete choice of interfaces, including a set of native API’s, file access interfaces and S3

REST. In addition, WOS can be configured with preinstalled applications such as iRODS for data management or WOS

Share for secure global file sharing.

Page 17: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 17

Enterprise-grade Platform Most vendors recommend commodity hardware for their object storage platforms. In the short term, this could

mean initial CAPEX savings, but as such devices typically have shorter replacement cycles, this highly impacts the

OPEX further down the road. This is especially so for multi petabyte deployments. While WOS was designed to be

hardware agnostic, we designed the WOS 7000 hardware to reduce TCO. Unlike commodity hardware, the WOS 7000

has an ultra dense form factor, so there are fewer systems to house, manage, power, cool and maintain. Leveraging

over 15 years of hardware design for the most demanding HPC environments, WOS 7000 was built to run many more

years than cheaper commodity hardware.

WOS Benefits Lowest Global Access Latency

WOS was designed with the intent of maximizing performance for storage of massive volumes of immutable data.

Scales with All Varieties of Applications

WOS scales virtually unlimited in clusters as large as 30PB. Those clusters can consist of any mix of small (kilobytes)

or large (terabytes) files.

Best Durability & System Availability

WOS’ choice of data protection schemes allows the customer to deploy object storage that combines durability with

availability.

Lowest Administration Overhead, Lowest TCO

Through automated management, lower hardware costs, less power usage, simple architecture, optimized disk

usage and reduced WAN bandwidth usage; WOS enables organizations to store more data at a much lower cost.

Simple Integration

Integrate WOS with your GRIDScaler GPFS storage or your EXAScaler Lustre platform. Use WOS as an archive for your

HScaler Big Data Storage, or build an Active Archive of WOS with a tape library for offline cold archiving.

Maximum Portability

WOS features the most complete set of interfaces to facilitate your application integration, including C++ and Java

APIs for direct application integration, REST for web applications (S3 or not) and file gateways to support file-based

workflows.

Best Data Center Density

Designed for massive HPC deployments, WOS 7000 provides the highest data center density possible.

Page 18: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 18

Ecosystem Object Storage is clearly the hot space in the storage industry, with offerings from both startups and established

storage solution providers. But, there is more than just object storage on the market: object storage has fostered a

wave of innovation that enables or leverages the paradigm.

The list of Tier 2 object storage players can be endless, especially when including the application providers. Here is

a short selection of popular gateways, WAN optimizers, collaboration platforms and other applications. This should

help to provide a better understanding of the object storage ecosystem and the opportunities and use cases.

Ctera® CTERA leverages object storage to offer a range of solutions for SMBs, enterprise branch offices and remote

users, including: data backup and recovery, file-based collaboration and mobile access.

Mezeo® also provides a number of storage solutions that leverage object storage, including: an AWS compatible

REST API and a number of file sync and share clients that give users access and collaboration capabilities from their

PC/Mac, smartphone, tablet or browser interface.

Panzura® built a NAS gateway for storage clouds. The gateway enables enterprises to combine multiple storage

(cloud) resources and make them accessible to multiple locations, presented as a unified global file system.

Aspera® both leverages and facilitates object storage. On the one hand, they have a number of applications for

collaboration, distribution etc., but the core of their technology is a protocol that optimizes how data is sent from

the object storage pool, over the WAN to a user application - or between sites, if an object storage infrastructure is

distributed over multiple locations.

Bitspeed® and Silverpeak® are active in the same space: WAN optimization, which enables faster, more reliable

and] secure data transfer between storage sites - or between the object storage pool and the application. These

technologies are becoming increasingly important in the deployment of object storage based storage clouds.

Dropbox® is probably the best-know object storage success case. This early AWS S3 customer launched a file-shar-

ing application when no one even knew they needed one. The power of Dropbox lies in their use of deduplication

(when multiple users store the same file in their Dropbox, only one copy is kept). This way, Dropbox saves a lot on

storage costs. Deduplication is not new, but Dropbox pioneered its use in an online, object storage based applica-

tion. This also allowed them to quickly gain a large user base through a fermium model, which would have been

unaffordable otherwise.

Page 19: BeginnersGuideToObjectStorage_whitepaper

ddn.com©2013 DataDirect Networks. All Rights Reserved. 19

Box(.net)™ also started as an online file sharing application but with some very important differences. Box runs

on their own (object storage) infrastructure, which gave them more control over security, data integrity etc. (as

compared to using S3). This allowed them to bring their solution to the SMB and Enterprise markets. Today, Box.net

grew to what can probably best be described as a storage-centric Platform as a Service, enabling organizations to

customize apps, integrate with their own applications etc.

Netflix®, which launched as a DVD rental by mail is an early adopter of object storage: in 2007 it launched a movie

streaming service which would disrupt the market. Well before Apple added movies and tv shows to their store,

Netflix leveraged S3 to offer movies in an online format.

Apple®, Google® and Facebook® also have massive object storage deployments, but little is known about their

architectures. Apple and Google are going after the S3 end users with document sharing and other storage in the

cloud services. With this, they compete both with Amazon and the applications that use S3 such as Dropbox and

Evernote.

Resources http://knowledgelayer.softlayer.com/learning/introduction-object-storage

http://docs.openstack.org/trunk/openstack-object-storage/admin/content/ch_introduction-to-openstack-object-storage.html

http://cloudarchitect.att.com/Articles/Introduction-Object-Based-Storage

http://www.conres.com/hitachi-hds-object-storage-content-platform

http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf

http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

https://en.wikipedia.org/wiki/Representational_state_transfer#RESTful_web_APIs

Page 20: BeginnersGuideToObjectStorage_whitepaper

ddn.com20©2013 DataDirect Networks. All Rights Reserved.

Version-1 5/13

DataDirect Networks (DDN) is the world leader in massively scalable storage. Our data storage and processing solutions and professional services enable content-rich and high growth IT environments to achieve the highest levels of systems scalability, efficiency and simplicity. DDN enables enterprises to extract value and deliver business results from their information. Our customers include the world’s leading online content and social networking providers, high performance cloud and grid computing, life sciences, media production, and security and intelligence organizations. Deployed in thousands of mission critical environments worldwide, DDN’s solutions have been designed, engineered and proven in the world’s most scalable data centers to ensure competitive business advantage for today’s information powered enterprise.

For more information, go to www.ddn.com or call +1.800.837.2298

© 2013, DataDirect Networks, Inc. All Rights Reserved. DataDirect Networks, EXAScaler, GRIDScaler,

hScaler, ReACT, SFA12K, SFA, SFX, Storage Fusion Xceleration, Web Object Storage, WOS are trademarks of

DataDirect Networks. All other trademarks are the property of their respective owners.

DDN | About Us