Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
PRESENTATION TITLE GOES HERE What’s Your Shape? 5 Steps to Understanding Your Virtual Workload
Irfan Ahmad CTO
CloudPhysics
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions:
Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.
This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
2
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Abstract
Is your Virtual Machine (VM) rightsized? Is your VM is on the right datastore?
Is your VM IO bound?
These are the kinds of questions that come up frequently when you’re doing capacity management, solving performance problems, and making procurement decisions. The root of the answer to all of them is the shape of your workload. Learn you how to find the answers to workload shape questions. This tutorial delves into some of the challenges inherent in right-sizing virtual workloads by discovering the shape of your workload and applying that knowledge to capacity and performance decisions.
Is the disk workload sequential or random? How much parallelism is there? How do we figure out the shape of a workload? What are the tools and techniques we can
use? What is the bottleneck resource? How do we map workloads to the right mix of storage, CPU, memory and network?
3
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
The Virtualization Promise
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
3
…No Longer Delivering
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
A New Set of Headaches
Will I really benefit from costly SSD
cache?
Will performance
suffer if I consolidate
more?
How do I properly
plan my IT budget for next year?
How do I ensure that we meet our
SLAs?
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Designed Like Civil Engineering
Operated Like Airlines
Predicted Like Chip Design
Other large operations have powerful tools to design and manage, but datacenters do not.
• CAD software helps design infrastructure and model costs before building
• Logistics management software allows for maximizing efficiency
• Design automation software allows for testing before costly manufacturing
Can’t Datacenters Be…?
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
…Sure They Can!
Example: Storage performance Predict how an SSD design will perform? Model the cost of operations and ROI?
Tools now exist to design, predict and operate data centers. And workload shapes are the key!
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Flash Overtakes In Mindshare
Interest in flash memory has risen greatly, but it is costly and doesn’t benefit everyone.
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
SSD has garnered a lot of hype
But does reality live up to the hype?
Source: Google Trends, March 2014
Solid-state drive SSD
“The economics of flash memory are staggering. If you’re not using SSD, you are doing it wrong.” – High Scalability
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
SSDs: A cautionary tale
• Large SSD caching project POC completed Production deployment completed
• But back-of-the-envelope VM selection VMs solely selected on application identity
• Project was a DISASTER VMs couldn’t possibly benefit – tremendous waster
Company Quick Facts
• Light Vehicle Automotive • Publicly traded • Established 1950s
• 4,000+ employees • $3bln+ revenue
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Key questions
Will SSDs benefit my datacenter?
Which of my VMs / applications?
How much cache do I need?
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Do VMs benefit from SSDs? Depends…
Is the disk even a bottleneck (or is it CPU, memory)? How do you determine if a VM will benefit from caching? Detailed workload characterization
Outstanding IOs analysis Read/write ratio analysis Latency analysis Cache hit ratio analysis
No simple rule of thumb! No one size fits all
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Made up Example
0
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8 9 10Latency of an operation (microseconds)
Frequency
Histograms are much more informative than single numbers like mean, median, and standard deviations from the mean
e.g., multimodal behaviors are easily identified by plotting a histogram, but obfuscated by a mean
Histograms can actually be calculated efficiently online Why take one number if you can have a distribution?
Mean is 5.3!
Workload Shapes Technique
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Workload Shapes Technique
The ESX disk IO workload characterization is on a pervirtual disk basis
Allows us to separate out each different type of workload into its own container and observe trends
Technique: For each virtual machine IO request in ESX, we insert some values into histograms
E.g., size of IO request → 4KB
0246
1024
2048
4096
8192
0246
1024
2048
4096
8192
Data collected
per-virtual
disk
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Workload Shapes Technique
Read/Write Distributions are available for our histograms
Overall Read/Write ratio?
Are Writes smaller or larger than Reads in this workload?
Are Reads more sequential than Writes?
Which type of IO is incurring more latency?
IO Size All, Reads, Writes
Seek Distance All, Reads, Writes
Seek Distance Shortest Among Last 16
Outstanding IOs All, Reads, Writes
IO Interarrival Times All, Reads, Writes
Latency All, Reads, Write
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
IO Length Filebench OLTP I/O Length Histogram
0500
100015002000250030003500
512
1024
2048
4095
4096
8191
8192
1638
3
1638
4
3276
8
4915
2
6553
5
6553
6
8192
0
1310
72
2621
44
5242
88
>524
288
Length (bytes)
Frequency
I/O Length Histogram
0200400600800
1000120014001600
512
1024
2048
4095
4096
8191
8192
1638
3
1638
4
3276
8
4915
2
6553
5
6553
6
8192
0
1310
72
2621
44
5242
88
>524
288
Length (bytes)Frequency
UFS
ZFS
4K and 8K IO transformed into 128K by
ZFS?
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Seek Distance Filebench OLTP
Seek Distance Histogram
0
200
400
600
800
1000
1200
1400
-50
00
00
-50
00
0
-50
00
-50
0
-64
-16 -6 -2 0 2 6
16
64
50
0
50
00
50
00
0
50
00
00
Distance (sectors)
Frequency
Seek Distance Histogram
0
50
100
150
200
250
300
-500
000
-500
00
-500
0
-500 -64
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (sectors)
Freq
uen
cy
UFS
ZFS
Seek distance: measure of sequentiality versus randomness in a workload Somehow a random workload is transformed into a sequential one by ZFS! More details needed ...
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Seek Distance Filebench OLTP—More Detailed
UFS
ZFS
Seek Distance Histogram (Writes)
0
200
400
600
800
1000
1200
-500
000
-500
00
-500
0
-500 -64
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (sectors)
Frequency
Seek Distance Histogram (Writes)
0
50
100
150
200
250
300
-500
000
-500
00
-500
0
-500 -64
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (sectors)
Frequency
Seek Distance Histogram (Reads)
0
50
100
150
200
250
300
-500
000
-500
00
-500
0
-500 -64
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (sectors)
Frequency
Seek Distance Histogram (Reads)
0
100
200
300
400
500
600
-500
000
-500
00
-500
0
-500 -64
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (sectors)
Frequency
Split out reads & writes
Transformation from Random to Sequential: primarily for Writes Reads: Seek distance is reduced (look at histogram shape & scales)
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Filebench OLTP Summary
So, what have we learnt about Filebench OLTP? IO is primarily 4K but 8K isn’t uncommon (~30%) Access pattern is mostly random
Reads are entirely random Writes do have a forward-leaning pattern
ZFS is able to transform random Writes into sequential: Aggressive IO scheduling Copy-on-write (COW) technique (blocks on disk not modified in place) Changes to blocks from app writes are written to alternate locations Stream otherwise random data writes to a sequential pattern on disk
Performed this detailed analysis in just a few minutes
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
OSDL Database Test 2 (Linux 2.6.17-10) Analysis
Workload is primarily random (big spikes towards the right and left edges of the graph) Still, many IOs that are within 500 sectors (20%) or within 5,000 sectors (33%) of the previous command The workload is almost exclusively 8K for both reads and writes
Seek Distance Histogram (Writes)
0
50
100
150
200
250
300
-500
000
-500
00
-500
0
-500 -6
4
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (sectors)
Frequency
I/O Length Histogram
0200400600800
10001200140016001800
512
1024
2048
4095
4096
8191
8192
1638
3
1638
4
3276
8
4915
2
6553
5
6553
6
8192
0
1310
72
2621
44
5242
88
> 52
4288
Length (bytes)
Frequency
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
OSDL Database Test 2 (Linux 2.6.17-10) Analysis (2)
The number of outstanding IOs are very different in this workload between reads and writes PostgreSQL almost always issues 32 write IOs simultaneously IO rate from this workload varies over time as much as 15% over a 2 min period
Outstanding I/Os Histogram (Reads, Writes)
0100200300400500600700800900
1000
1 2 4 6 8 12 16 20 24 28 32 64
> 64
I/Os Outstanding at Arrival time
Frequency
ReadsWrites
1 4 8 16 24 32 > 64S1
S6
S11
S16
0
200
400
600
800
1000
1200
Frequency
I/Os Outstandingat Arrival time
Time (in 6 sec
intervals)
Outstanding I/Os Histogram over Time
1000-1200800-1000600-800400-600200-4000-200
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
OSDL Database Test 2 (Linux 2.6.17-10) Summary
On the aggregate the workload appears random But 20% of IOs are within 250KB and 33% are within 2.4MB!
IO size is 8K for both reads and writes Outstanding IOs very different between reads and writes
PostgreSQL almost always issues 32 write IOs simultaneously
IO rate varies over time (up to 15%) Don’t assume that every database workload behaves the same; measure and determine for yourself
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Use Cases for Workload Shapes
Analyzing a new disk performance sensitive workload
Tuning of underlying disk subsystem
How to interpret Pay attention to changes in distribution shape as well as magnitude
Which metrics to start with IO Size
Read/Write Ratios
Outstanding IOs
Corrective actions Tune disk subsystem and re-measure; pay attention to latency histogram
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Workload shape matters
Bursty writes Steady read traffic
The read/write ratio is highly biased towards reads.
8K reads and writes
Bimodal spatial locality
Understanding application IO patterns is the first step in predicting SSD benefits.
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Limitation of Simple Shape Analysis
Strength: allows deep analysis of separate flavors of workloads in each VM by splitting workloads by virtual disk
Place DB redo logs on a separate virtual disk than the DB tablespaces Weakness: doesnt give a complete picture of IO going to a storage array
Many VMs might be doing IO from same ESX host VMs from different ESX hosts might be doing IO In general, it is a hard problem to figure out Rule of thumb: IO to a LUN from different apps is effectively random Still: storage arrays are rather smart to pull off individual sequential streams and schedule IO per stream
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Disk Access Traces Matter Even More
(Source: USENIX ’06)
Knowing patterns isn’t enough. Exact IO sequences are required.
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Algorithms to the rescue
⊕ ⇓ Hit Ratio Curves
Data access patterns, IO sequences and complex analytics allow for maximizing ROI of SSD cache.
Big gains at ~500MB and 2200MB, but little in between.
⊕ Simulation Prediction
Algorithms
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
SSDs: A success story
• 100s of VMs analyzed
• 16% showed benefit from server-side SSD cache
• Improvement ranged from 50% to 200% better response times
• Hit Ratio Curve derived cache size recommendations ranged from 1GB - 512GB (VM-by-VM basis)
Company Quick Facts
• Boston-area hedge fund • International operations • Assets >$20B
• Established 1980s • 50+ employees
Successfully identified VMs that benefit from SSD cache, SSDs in 16% of VMs.
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
More success: SSDs in 3% of VMs
• 100s of virtual machines simulated
• 3% of VMs had 50% or higher improvement
• Hit Ratio Curves derived recommendations ranged from 1GB – 512GB (VM-by-VM basis)
• Customer installed 2 PCIe Flash cards to get maximum benefit via a strategic installation
• Public University (Boston area)
• Established 1850s
• 10,000 students • 1300+ employees
COMPANY QUICK FACTS
Even smaller deployments benefit.
PRESENTATION TITLE GOES HERE APPENDIX
31
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Attribution & Feedback
32
Please send any questions or comments regarding this SNIA Tutorial to [email protected]
The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial.
Authorship History Name/Date of Original Author here: Irfan Ahmad, CloudPhysics, April 17, 2014
Additional Contributors
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Workload Characterization Technique
To make the histograms practical, bin sizes are on rather irregular scales
E.g., the IO length histogram bin ranges like this: …, 2048, 4095, 4096, 8191, 8192, … rather odd: some buckets are big and others are as small as just 1 Certain block sizes are really special since the underlying storage subsystems may optimize for them; single those out from the start (else lose that precise information) E.g., important to know if the IO was 16KB or some other size in the interval (8KB,16KB)
2048
4095
4096
8191
8192
1638
3
1638
4
3276
8
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Windows File Copy
I/O Length Histogram
0200400600800
100012001400160018002000
512
1024
2048
4095
4096
8191
8192
1638
3
1638
4
3276
8
4915
2
6553
5
6553
6
8192
0
1310
72
2621
44
5242
88
>524
288
Length (bytes)
Frequency
Vista EnterpriseXP Pro
Seek Distance Histogram
0
200400
600
800
10001200
1400
1600
-500
000
-500
00
-500
0
-500 -6
4
-16 -6 -2 0 2 6 16 64 500
5000
5000
0
5000
00
Distance (bytes)
Frequency
Vista EnterpriseXP Pro
XP issues 64KB IOs IOs are largely sequential.
Vista is issuing very large IOs (1MB)
Number of commands is lower
IOs are very sequential
Latency is higher
Vista enables large IOs to be issued; file copy is just an example
Keep an eye out for increasing IO sizes in future workloads
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
How to Run
$ /usr/lib/vmware/bin/vscsiStats -p iolength Histogram: IO lengths of commands {
min : 512
max : 32768
mean: 11731
count : 241
{
5 (<= 512)
14 (<= 1024)
5 (<= 2048)
17 (<= 4095)
76 (<= 4096)
1 (<= 8191)
20 (<= 8192)
18 (<= 16383)
36 (<= 16384)
49 (<= 32768)
0 (<= 49152)
0 (<= 65535)
0 (<= 65536)
0 (<= 81920)
0 (<= 131072)
0 (<= 262144)
0 (<= 524288)
0 (> 524288)
}
}
$ /usr/lib/vmware/bin/vscsiStats -p latency Histogram: latency of IOs in Microseconds (us) {
min : 191
max : 13391
mean: 598
count : 288
{
0 (<= 1)
0 (<= 10)
0 (<= 100)
248 (<= 500)
28 (<= 1000)
4 (<= 5000)
8 (<= 15000)
0 (<= 30000)
0 (<= 50000)
0 (<= 100000)
0 (> 100000)
}
}
Bin Ranges (Bucket Limits). Think x-axis of histograms
plots
What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.
Performance Overhead of Stats Collection
Overhead is negligible (tested on internal build)
Used iometer to generate 4KB Sequential Reads
16 outstanding IOs on a Windows 2003 Enterprise Edition 64-bit VM 4KB is the most realistic worst-case scenario for overheads
Online Histo Service Disabled Enabled IOps 8187 8137 IOps Std. Dev. 6.5 200 MBps 35.1 34.8 CPU (out of 800) 106.0 108.0 CPU Std. Dev. 2.7 4.8
CPU Efficiency (UsedSec/IOps) 0.0417 0.0424
Latency (ms) 1.6 1.6
Table 2. Microbenchmark Performance