38
A distributed storage system

Ceph - A distributed storage system

Embed Size (px)

Citation preview

Page 1: Ceph - A distributed storage system

A distributed storage system

Page 2: Ceph - A distributed storage system

whoami

● Italo Santos

● @ Locaweb since 2007

● Sysadmin @ Storage Team

Page 3: Ceph - A distributed storage system

Introduction● Single Storage System

● Scalable

● Reliable

● Self-healing

● Fault Tolerant

● NO single point of failure

Page 4: Ceph - A distributed storage system

Architecture

Page 5: Ceph - A distributed storage system

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 6: Ceph - A distributed storage system

Ceph Storage Cluster

Page 7: Ceph - A distributed storage system

OSDs

MMonitors MDS

Page 8: Ceph - A distributed storage system

OSDs

Page 9: Ceph - A distributed storage system

OSDs● One per disk

● Store data

● Replication

● Recovery

● Backfilling

● Rebalancing

● OSDs heartbeat

Page 10: Ceph - A distributed storage system

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FSFS btrfsxfsext4

MMM

Page 11: Ceph - A distributed storage system

M

Ceph Monitors

Page 12: Ceph - A distributed storage system

Monitors● Cluster map

● Monitors map

● OSDs map

● Placement Group map

● CRUSH map

Page 13: Ceph - A distributed storage system

Metadata Server (MDS)

Page 14: Ceph - A distributed storage system

MDS● Used only by CephFS

● POSIX-compliant shared filesystem

● Manage metadata

○ Directory hierarchy

○ File metadata

● Stores metadata on RADOS

Page 15: Ceph - A distributed storage system

CRUSH

Page 16: Ceph - A distributed storage system

CRUSH● Pseudo-random placement algorithm

○ Fast calculation

○ Deterministic

● Statistically uniform distribution

● Limited data migration on change

● Rule-based configuration

Page 17: Ceph - A distributed storage system

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

CRUSH(pg, cluster state, rule set)

hash(object name) % num pg

Page 18: Ceph - A distributed storage system
Page 19: Ceph - A distributed storage system
Page 20: Ceph - A distributed storage system

CLIENT

??

Page 21: Ceph - A distributed storage system

Placement Groups (PGs)

Page 22: Ceph - A distributed storage system

Placement Groups● Logical collection of objects

● Maps PGs to OSDs dynamically

● Computationally less expensive

○ Reduce number of process

○ Less of per-object metadata

● Dynamically rebalance

Page 23: Ceph - A distributed storage system

Placement Groups

Page 24: Ceph - A distributed storage system

Placement Groups● Increase PGs reduces per-osd load

● ~100 PGs per OSD

(i.e., OSD per object = Number of replicas)

● Defined on pool creation

● PGs with multiple pools

○ Balance PGs per pool with PGs per OSD

Page 25: Ceph - A distributed storage system

Pools

Page 26: Ceph - A distributed storage system

Pools● Replicated

○ Object replicated N times (i.e., default size = 3)

○ Object + 2 protection replicas

● Erasure Coded

○ Stores objects as K+M chunks (i.e., size = K+M)

○ Divided into K data chunks and M coding chunks

Page 27: Ceph - A distributed storage system
Page 28: Ceph - A distributed storage system
Page 29: Ceph - A distributed storage system

Ceph Clients

Page 30: Ceph - A distributed storage system

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 31: Ceph - A distributed storage system

RadosGWCeph Object Gateway

Page 32: Ceph - A distributed storage system

RadosGW● Object Storage Interface

● Apache + FastCGI

● S3-compatible

● Swift-compatible

● Common namespace

● Store data on Ceph cluster

Page 33: Ceph - A distributed storage system

RBDRados Block Device

Page 34: Ceph - A distributed storage system

RBD● Block device interface

● Data striped on ceph cluster

● Thin-provisioned

● Snapshot support

● Linux Kernel-based (librbd)

● Cloud native support

Page 35: Ceph - A distributed storage system

CephFSCeph File System

Page 36: Ceph - A distributed storage system

CephFS● POSIX-compliant filesystem

● Shared filesystem

● Directory hierarchy

● File metadata (owner, timestamps, mode, etc.)

● Ceph MDS required

● NOT production ready!

Page 37: Ceph - A distributed storage system
Page 38: Ceph - A distributed storage system

ThanksItalo Santos @ Storage Team