Upload
dangdan
View
223
Download
0
Embed Size (px)
Citation preview
THE CURRENT AND THE FUTURE OF CEPHHAOMAI WANG 2015.10.30
ABOUT
I’M HAOMAI WANG
▸ Ceph core developer
▸ GSOC 2014, 2015 Ceph mentor
▸ Maintain KeyValueStore, AsyncMessenger, focus on Performance optimization
▸ Involve in database, local filesystem and storage
▸ NetBSD on VirtualBox author
TEXT
AGENDA
▸ What is Ceph?
▸ The current Ceph and the roadmap
WHAT IS CEPH?
WHAT IS CEPH?
CEPH MOTIVATION PRINCIPLES
▸ everything must scale horizontally no single point of failure commodity hardware
▸ self-manage whenever possible
▸ move beyond legacy approaches
▸ client/cluster instead of client/server
▸ avoid ad hoc high-availability
▸ open source
WHAT IS CEPH?
CEPH MOTIVATION PRINCIPLES
WHAT IS CEPH?
CEPH ECOSYSTEM
WHAT IS CEPH?
FEATURES
WHAT IS CEPH?
REPLICATION/TIERING
WHAT IS CEPH?
CRUSH▸ Cephs data distribution mechanism
▸ Pseudo-random placement algorithm
▸ Deterministic function of inputs
▸ Clients can compute data location
▸ Rule-based configuration
▸ Desired/required replica count
▸ Affinity/distribution rules
▸ Infrastructure topology
▸ Weighting
▸ Excellent data distribution
▸ De-clustered placement
▸ Excellent data-re-distribution
▸ Migration proportional to change
▸ failure prediction*
WHAT IS CEPH?
USE CASES
▸ The largest Ceph cluster: CERN
▸ Yahoo Flick
▸ SourceForge
▸ DreamHost
▸ eBay
▸ Deutsche Telekom AG
▸ OpenStack Cloud(~44%)
WHAT IS CEPH?
VENDOR
▸ Redhat
▸ Intel
▸ Sandisk
▸ Samsung
▸ Fujitsu
▸ Suse
▸ Canonical
THE CURRENT CEPH AND THE ROADMAP
THE CURRENT CEPH AND THE ROADMAP
INTERNAL OVERVIEW
Dispatch Layer
IO Replicated
ObjectStore Layer
File System
Block Device Interface
Sockets
TCP
IP
Ethernet
Virtual Memory
Messenger Layer
Recovery Scrub Tiering
Scheduler
Thread
DRAM
IO Controller
Disk
Network Controller
Port
Memory Library
CPU Interconnect
Queue
FileJournal
FileStore
LibRBDApplication
RadosGW
LibRadosSession
Messenger
THE CURRENT CEPH AND THE ROADMAP
CEPH STORAGE ENGINE
▸ FileStore
▸ NewStore: Replacing FileStore*
▸ KeyValueStore
▸ LevelDB/RocksDB/LMDB
▸ Kinetics API
▸ Samsung uFTL*
▸ Sandisk SSD Library*
▸ MemStore
▸ Memory Management(malloc/free)
▸ NVM(PMBackend, libpmem)*
THE CURRENT CEPH AND THE ROADMAP
THE NEW TIERING
▸ The new storage mountain
▸ The new challenge:
▸ More storage medium
▸ More complexity management way
▸ Data lake
▸ Migrate data with “temperature”
THE CURRENT CEPH AND THE ROADMAP
NETWORK
▸ TCP Messenger
▸ posix socket
▸ DPDK*
▸ SolarFlare*
▸ RDMA
THE CURRENT CEPH AND THE ROADMAP
QOS
▸ Priority based
▸ client priority
▸ message priority
▸ mLock algorithm*
▸ each message with “tag”
▸ exchange window size p2p
THE CURRENT CEPH AND THE ROADMAP
LIBRADOS▸ Object
▸ Name
▸ Attributes
▸ Data
▸ key/value data
▸ random access insertion, deletion, range query/list
▸ Operation
▸ CAS(Compare And Swap)
▸ Group Operation: Atomic, Rollback
▸ Snapshot: Object Granularity
▸ Copy On Write
▸ Rados Classes
▸ code runs directly inside storage server I/O path
▸ Watch/Notify
▸ Multi Object Transactions*
THE CURRENT CEPH AND THE ROADMAP
RADOS CLASSES - COMPUTE IN STORAGE SIDE
▸ write new RADOS “methods”
▸ code runs directly inside storage server I/O path
▸ simple plugin API; admin deploys a .so
▸ read-side methods
▸ process data, return result
▸ write-side methods
▸ process, write; read, modify, write
▸ generate an update transaction that is applied atomically
▸ Use cases:
▸ distributed “grep”
▸ LUA interpreter
THE CURRENT CEPH AND THE ROADMAP
RBD
▸ Thin Provision
▸ Snapshot
▸ Clone
▸ Multi-Client Support
▸ Kernel Client
▸ KVM/XEN
▸ VMWare VVOL*
▸ iSCSI
▸ LIO TCMU + loopback(FUSE)*
▸ Active/Passive*
▸ Active/Active**
THE CURRENT CEPH AND THE ROADMAP
RADOSGW
▸ S3/Swift
▸ Active/Slave
▸ One Writer
▸ Multi Active Sites*
▸ Hadoop/Spark FileSystem Interface*
▸ NFS protocol aware*
THE CURRENT CEPH AND THE ROADMAP
CEPHFS
▸ Dynamic subtree partition
▸ Strict posix compatible
▸ NFS
▸ QEMU VM
▸ virtues
▸ nfs over sock
▸ FSCK
▸ Multi-tenant
THANK YOU!
2015.10
END