21
Ceph Software-Defined Storage “The power to do more!” Andrew Underwood – HPC Enterprise Technologist ANZ Solutions Engineering Team

2015 open storage workshop ceph software defined storage

Embed Size (px)

Citation preview

Ceph Software-Defined Storage

“The power to do more!”

Andrew Underwood – HPC Enterprise Technologist

ANZ Solutions Engineering Team

Agenda

Melbourne, Australia – 2015 Open Storage Workshop

- The Changing Storage Market

- Introduction to Ceph

- Ceph Architecture

- Key Benefits of Ceph

- Case Study – Some of Dell’s Australian Customers

- Next Steps to Building Your Scalable Storage Solution….

Storage Market Is Changing

• Storage needs are exploding

• There are limited highly scalable storage options available

• Continued cost pressures and budget squeezes are limiting IT expenditure

• The cost of proprietary technologies is increasing with annual license fees

• Current storage technologies are limiting when development teams want to

scale-out their workloads

• New data sets are changing the way IT departments need to think about storage

HPCEnabling exascale computing on massive data sets

OpenStack Helping enterprises build open interoperable clouds

Big DataTurning customer data into value

The forces that drive Dell, also drive our customers

Introduction to Ceph

File SystemBlock StorageObject Storage

User-Management

OpenStack Keystone

Disaster Recovery

Multi-tenant

OpenStack Swift API

Copy-on-Write Cloning

In-Memory Caching

Native Linux Kernel

Support

Snapshots

Thinly Provisioned

CIFS/NFS

HDFS

Distributed Metadata

Linux Kernel

POSIX

Billing CapableSupport for KVM & Xen

Up to 16 Exabyte

Snapshots

Dynamic Rebalancing

Introduction to Ceph (continued…)

CEPHFSA distributed file system

with POSIX semantics and scale-out metadata

management

RGWA web services gateway

for object storage, compatible with S3 and

Swift

RBDA reliable, fully-distributed block device with cloud

platform integration

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

ApplicationVirtual

MachineClient

Ceph Architecture 101

CEPHFSA distributed file system

with POSIX semantics and scale-out metadata

management

RGWA web services gateway

for object storage, compatible with S3 and

Swift

RBDA reliable, fully-distributed block device with cloud

platform integration

ApplicationVirtual

MachineClient

Object Storage Daemons

Ceph OSDs store data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat.

Monitoring Nodes

Ceph Monitors manage the health of the cluster and

implement the cluster map

Metadata Server

The MDS stores metadata on behalf of CephFS – this

is not a dedicated server and is distributed.

Ceph Architecture 101

The Ceph Storage Cluster - RADOS

CephFS, Ceph Object Storage and Ceph Block Devices read data from and write data to the Ceph Storage Cluster.

Based on RADOS, the Ceph Storage Cluster consists of two daemons: a Ceph OSD Daemon (OSD) stores data as objects on a storage node; and a Ceph Monitor (MON) maintains a master copy of the cluster map.

A Ceph Storage Cluster provides a single logical Object Store to clients, and is responsible for data migration, replication, failure detection, and failure recovery

Object Storage Daemons Monitoring Nodes

CEPHFSRGW RBD

LIBRADOS

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

Ceph Architecture 101

Metadata Server

Object Storage Daemons

Ceph OSDs store data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat.

Monitoring Nodes

Ceph Monitors manage the health of the cluster and

implement the cluster map

Metadata Server

The MDS stores metadata on behalf of CephFS – this

is not a dedicated server and is distributed.

The MDS cluster is diskless and MDSs just serve as an index to the OSD cluster for facilitating read and write. All metadata, as well as data, are stored at the OSD cluster.

Ceph Architecture 101

Monitoring Nodes

Object Storage Daemons

Ceph OSDs store data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat.

Monitoring Nodes

Ceph Monitors manage the health of the cluster and

implement the cluster map

Metadata Server

The MDS stores metadata on behalf of CephFS – this

is not a dedicated server and is distributed.

Ceph’s monitors maintain a master copy of the cluster map. Ceph processes and clients contact the monitor periodically to ensure they have the most recent copy of the cluster map. Ceph seeks agreement among various monitor instances regarding thestate of the cluster. To ensure a consensus Ceph always uses an odd number of monitors (3, 5, 7…) and the Paxos algorithm., high re

The monitoring nodes are critical, but commodity hardware is key – lower TCO, commodity x86, high reliability rack mount servers

Ceph Architecture 101

Ceph Object Storage Daemon

Object Storage Daemons

Ceph OSDs store data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat.

Monitoring Nodes

Ceph Monitors manage the health of the cluster and

implement the cluster map

Metadata Server

The MDS stores metadata on behalf of CephFS – this

is not a dedicated server and is distributed.

The Ceph Object Storage Daemon is the main component of the cluster that serve up the storage objects on the physical disks. x86 servers like the Dell R730XD or the R630 connected to an MD1200 JBOD can be used to house an OSD (physical disks for storage) and are powered by a balanced CPU, memory and network bandwidth ratio to connect them as a distributed cluster running the OSD daemons.

As best practice Dell recommends – 1Ghz of x86 CPU-cycle per OSD (i.e. 1Ghz per HDD/SSD) and 2GB RAM

This would mean a Dell PowerEdge R730XD with 12 x 4TB HDD (48TB) would require 24GB RAM and 1 x 8Core E5-2630L V3 1.8Ghz CPU (1 x 1.8Ghz = 14.4Ghz)

Object Storage Device (OSD)(not to be confused with a Ceph OSD which is a daemon!

File Systembtrfsxfsext4

Ceph Architecture 101

RADOS Cluster Map

Object Storage Daemons

Ceph OSDs store data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph

OSD Daemons for a heartbeat.

Monitoring Nodes

Ceph Monitors manage the health of the cluster and

implement the cluster map

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

btrfsxfsext4

RADOS Node

RADOS Cluster

Node Node Node

Ceph Architecture 101

RADOS Cluster Map

x 64TB OSD

OSD

FS

DISK

= 24TB per Node

x 4224TB per Node = 1PB Cluster

What about replication?

3 Replica’s per object across 1PB

= 344TB usable

Ceph Architecture 101

Erasure Coding Part 1Application

RBD

RADOS

Compute Node 1

The Compute Node / Application initiates a read request and RADOS

sends the request to the primary OSD

The primary OSD reads data from the disk and completes the read request

Storage Node 1Storage Node 2

Storage Node 3

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

Ceph Architecture 101

Erasure Coding Part 1Application

RBD

RADOS

Compute Node 1

Storage Node 1Storage Node 2

Storage Node 3

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

OSD OSD OSD OSD OSD

FS FS FS FSFS

DISK DISK DISK DISK DISK

The Compute Node / Application writes data and RADOS sends the

request to the primary OSD

The primary OSD id’s the replica OSDs and sends data, then writes

data to disk

The replica OSD writes data to disk and informs primary OSD when

complete

The primary OSD informs Compute Node / Application that the process

is complete

Key Benefits

Enterprise Ready

Open Source

Massively Scalable

Extensible

Low TCO

No Single Point of Failure

Self-Managing

Rapid Provisioning

• Enterprise Ready – Dell and our numerous software partners certify and support this solution

• Open Source – Development timeframes are accelerated and there is no proprietary lock-in

• Enables companies to scale out storage demands cost effectively vs other OpenStack & VM solutions

• Provisions storage resources efficiently and is easy to manage at scale

• Commodity storage servers reduces cost per gigabyte compared to proprietary

• Provides scalable and resilient storage solution on commodity hardware

• Intelligent software algorithms to manage placement and replication

• API’s for C, C++, Java, Python, Ruby, PHP – all directly to RADOS

• Unified storage platform (Object + Block + File)

Dell & Australian ResearchUniversities Leading the Way with OpenStack HPC Cloud

Case study

High-impact research lifts off into a new type of cloud

Read the full Press Release >

Institute builds research infrastructure for the future using community-driven, open-source components

Enables researchers to respond rapidly to new developments by providing instant access to scalable computing resources and applications

Researchers can share computational results easily with collaboration partners around the world

NCI builds country’s first HPC research cloud on OpenStack at Australian National University

The new science cloud will provide Australian researchers, for the first time, on-demand access to high-performance compute and storage resources which provides increased technological capabilities compared to commercial and academic cloud offerings., according to Dr. Joseph Antony, NCI Cloud Services Manager.

Next Steps to Building Your Scalable Storage Solution….

Contact your Dell Solution Consultant to review your enterprise environment

Set up a Pilot Ceph Deployment Scale Out, Not Up

Today Tomorrow The Future

Thank you

Let’s grab a coffee and chat more!

Andrew Underwood – HPC Enterprise Technologist

[email protected]

• Dell.com/OpenStack

• Dell.com/HPC

21 Confidential HPC & Research Computing