Upload
vothu
View
220
Download
0
Embed Size (px)
Citation preview
Ceph vs Swift Performance Evaluation on a Small Cluster
eduPERT monthly call
July, 24th 2014
Jul 24, 2014 GÉANT eduPERT meeting
About me
• Vincenzo Pii
• Researcher @
• Leading research initiative on Cloud Storage • Under the theme IaaS
• More on ICCLab: www.cloudcomp.ch
Jul 24, 2014 GÉANT eduPERT meeting
About this work
• Performance evaluation study on cloud storage
• Small installations
• Hardware resources hosted at the ZHAW ICCLab data center in Winterthur
• Two OpenStack clouds (stable and experimental)
• One cluster dedicated to storage research
Jul 24, 2014 GÉANT eduPERT meeting
Cloud storage
• Cloud storage • Based on distributed, parallel, fault-tolerant file systems • Distributed resources exposed through a single homogeneous
interface • Typical requirements
• Highly scalable • Replication management • Redundancy (no single point of failure) • Data distribution • …
• Object storage • A way to manage/access data in a storage system • Typical alternatives
• Block storage • File storage
Jul 24, 2014 GÉANT eduPERT meeting
Ceph and Swift
• Ceph (ceph.com) • Supported by Inktank
• Recently purchased by RedHat (owners of GlusterFS)
• Mostly developed in C++
• Started as PhD thesis project in 2006
• Block, file and object storage
• Swift (launchpad.net/swift) • OpenStack object storage
• Completely written in Python
• RESTful HTTP APIs
Jul 24, 2014 GÉANT eduPERT meeting
Objectives of the study
1. Performance evaluation of Ceph and Swift on a small cluster • Private storage • Storage backend for own-apps with limited requirements
in size • Experimental environments
2. Evaluate Ceph maturity and stability • Swift already widely deployed and industry-proven
3. Hands-on experience • Configuration • Tooling • …
Jul 24, 2014 GÉANT eduPERT meeting
Network configuration
• Three servers on a dedicated VLAN
• 1 Gbps NICs
• 100BaseT cabling
Jul 24, 2014 GÉANT eduPERT meeting
10.0.5.x/24
Node 1 (.2)
Node 2 (.3)
Node 3 (.4)
Servers configuration
• Hardware • Lynx CALLEO Application Server 1240
• 2x Intel® Xeon® E5620 (4 core)
• 8x 8 GB DDR3 SDRAM, 1333 MHz, registered, ECC
• 4x 1 TB Enterprise SATA-3 Hard Disk, 7200 RPM, 6 Gb/s (Seagate® ST1000NM0011)
• 2x Gigabit Ethernet network interfaces
• Operating system • Ubuntu 14.04 Server Edition with Kernel 3.13.0-24-
generic
Jul 24, 2014 GÉANT eduPERT meeting
Disks performance
READ:
WRITE:
Jul 24, 2014 GÉANT eduPERT meeting
$ sudo hdparm -t --direct /dev/sdb1
/dev/sdb1:
Timing O_DIRECT disk reads: 430 MB in 3.00 seconds = 143.17 MB/sec
$ dd if=/dev/zero of=anof bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 8.75321 s, 123 MB/s
Network performance
Jul 24, 2014 GÉANT eduPERT meeting
$ iperf -c ceph-osd0
------------------------------------------------------------
Client connecting to ceph-osd0, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.5.2 port 41012 connected with 10.0.5.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec
942 Mbits/s 117.5 MB/s
Ceph OSDs
Jul 24, 2014 GÉANT eduPERT meeting
Monitor (mon0)
HDD1 (OS)
Not used
Not used
Not used
St. node 0
HDD1 (OS)
osd0 – XFS
osd1 – XFS
Journal
St. node 1
HDD1 (OS)
osd2 – XFS
osd3 – XFS
Journal
cluster-admin@ceph-mon0:~$ ceph status
cluster ff0baf2c-922c-4afc-8867-dee72b9325bb
health HEALTH_OK
monmap e1: 1 mons at {ceph-mon0=10.0.5.2:6789/0}, election epoch 1,
quorum 0 ceph-mon0
osdmap e139: 4 osds: 4 up, 4 in
pgmap v17348: 632 pgs, 13 pools, 1834 bytes data, 52 objects
199 MB used, 3724 GB / 3724 GB avail
632 active+clean
cluster-admin@ceph-mon0:~$ ceph osd tree
# id weight type name up/down reweight
-1 3.64 root default
-2 1.82 host ceph-osd0
0 0.91 osd.0 up 1
1 0.91 osd.1 up 1
-3 1.82 host ceph-osd1
2 0.91 osd.2 up 1
3 0.91 osd.3 up 1
Swift devices
Building rings on storage devices
(No separation of Accounts, Containers and Objects)
Jul 24, 2014 GÉANT eduPERT meeting
export ZONE= # set the zone number for that storage device
export STORAGE_LOCAL_NET_IP= # and the IP address
export WEIGHT=100 # relative weight (higher for bigger/faster disks)
export DEVICE=
swift-ring-builder account.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6002/$DEVICE $WEIGHT
swift-ring-builder container.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6001/$DEVICE $WEIGHT
swift-ring-builder object.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6000/$DEVICE $WEIGHT
Swift Proxy
HDD1 (OS)
Not used
Not used
Not used
St. node 0
HDD1 (OS)
dev1 – XFS
dev2 – XFS
Not used
St. node 1
HDD1 (OS)
dev3 – XFS
dev4 – XFS
Not used
Highlighting a difference
• LibRados used to access Ceph • Plain installation of a “Ceph storage cluster” • Non ReST-ful interface • This is the fundamental access layer in Ceph • RadosGW (Swift/S3 APIs) is an additional component on top of
LibRados (as block and file storage clients)
• ReST-ful APIs over HTTP used to access Swift • Extra overhead in the communication • Out-of-the box access method for Swift
• This is part of the differences to be benchmarked, even if... • … HTTP APIs for object-storage are interesting for many use cases • This use case:
• Unconstrained self-managed storage infrastructure for, e.g., own apps • Control over infrastructure and applications
Jul 24, 2014 GÉANT eduPERT meeting
Tools
• COSBench (v. 0.4.0.b2) - https://github.com/intel-cloud/cosbench • Developed by Intel • Benchmarking for Cloud Object Storage • Supports both Swift and Ceph
• Cool web interface to submit workloads and monitor current status • Workloads defined as XML files • Very good level of abstractions applying to object storage
• Supported metrics • Op-Count (number of operations) • Byte-Count (number of bytes) • Response-Time (average response time for each successful request) • Processing-Time (average processing time for each successful request) • Throughput (operations per seconds) • Bandwidth (bytes per seconds) • Success-Ratio (ratio of successful operations)
• Outputs CSV data • Graphs generated with cosbench-plot - https://github.com/icclab/cosbench-plot
• Describe inter-workload charts in Python
Jul 24, 2014 GÉANT eduPERT meeting
Workload matrix
Containers Objects size R/W/D Distr. (%) Workers
1 4 kB 80/15/5 1
20 128 kB 100/0/0 16
512 kB 0/100/0 64
1024 kB 128
5 MB 256
10 MB 512
Jul 24, 2014 GÉANT eduPERT meeting
Workloads
• 216 workstages (all the combinations of the values of the workload matrix)
• 12 minutes per workstage
• 2 minutes warmup
• 10 minutes running time
• 1000 objects per container (pools in Ceph)
• Uniformly distributed operations over the available objects (1000 or 20000)
Jul 24, 2014 GÉANT eduPERT meeting
Performance Analysis Recap
• Ceph performs better when reading, Swift when writing • Ceph → librados • Swift → ReST APIs over HTTP
• More remarkable difference with small objects • Less overhead for Ceph
• Librados • CRUSH algorithm
• Comparable performance with bigger objects • Network bottleneck at 120 MB/s for read operations • Response time
• Swift: greedy behavior • Ceph: fairness
Jul 24, 2014 GÉANT eduPERT meeting
General considerations: challenges
• Equivalency • Comparing two similar systems that are not exactly overlapping • Creating fair setups (e.g., Journals on additional disks for Ceph) • Transposing corresponding concepts
• Configuration • Choosing the right/best settings for the context (e.g., number of Swift
workers)
• Identifying bottlenecks • To be done in advance to create meaningful workloads
• Workloads • Run many tests to identify saturating conditions • Huge decision space
• Keep up the pace • Lot of developments going on (new versions, new features)
Jul 24, 2014 GÉANT eduPERT meeting
General considerations: lessons learnt
• Publication medium (blog post) • Excellent feedback (e.g., Rackspace developer) • Immediate right of reply and “real” comments
• Most important principles • Openness
• Share every bit of information • Clear intents, clear justifications
• Neutrality • When analyzing the results • When drawing conclusions
• Very good suggestions coming from “you could”, “you should”, “you didn’t” comments
Jul 24, 2014 GÉANT eduPERT meeting
Future works
• Performance evaluation necessary for cloud storage
• More object storage evaluations • Interesting because it’s very close to the application
level
• Block storage evaluations • Very appropriate for IaaS
• Provide storage resources to VMs
• Seagate Kinetic • Possible opportunity to work on a Kinetic setup
Jul 24, 2014 GÉANT eduPERT meeting