Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

Analyzing Performance in the Cloudsolving an elastic problem with a scientific approach

Nicholas Wakou (Dell EMC), Alex Krzos (Red Hat)Thursday, October 27, 2016Barcelona Openstack Summit 2016

Presenters

Alex Krzos is a Senior Performance Engineer at Red Hat working on [email protected]

https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/16204/analyzing-performance-in-the-cloud-solving-an-elastic-problem-with-a-scientific-approach

Nicholas Wakou is a Principal Performance Engineer with the Dell EMC Open Source [email protected]




Agenda

➢ CLOUD DEFINITION & CHARACTERISTICS

➢ PERFORMANCE MEASURING TOOLS

➢ SPEC CLOUD IaaS 2016 BENCHMARK

➢ PERFORMANCE MONITORING TOOLS

➢ PERFORMANCE CHARACTERIZATION

➢ TUNING TIPS

CLOUD DEFINITION & CHARACTERISTICS

DEFINING A CLOUD

NIST SPECIAL PUBLICATION 800-145

Cloud computing is a model for enabling ubiquitous,convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidlyprovisioned and released with minimal management effort or service provider interaction.

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf



CLOUD CHARACTERISTICS

PERFORMANCE MEASURING TOOLS

RALLYOpenStack Benchmarking Tool

➢ as-an-App and as-a-Service➢ Verification➢ Benchmarking➢ Profiling➢ Reports➢ SLAs for Benchmarks➢ Many plugins

Source: What is Rally?, https://rally.readthedocs.io/en/latest/

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

OpenSource Living Benchmarking Framework containing a set of Benchmarks used to compare cloud offerings/environments

➢ 10+ Cloud Providers/Environments➢ 34+ Benchmarks➢ Large Community Involvement➢ Capture Cloud Elasticity with Benchmark

Results➢ Use Cloud/Environment CLI Tooling➢ Publish Results to BigQuery for Comparison

PERFKIT EXPLORER

Source: https://github.com/GoogleCloudPlatform/PerfKitExplorer

Dashboarding and Performance Analysis Tool for PerfKitBenchmarker Results

➢ Multiple Chart Options➢ Uses BigQuery as backend data-store➢ Hosted in Google App Engine

CLOUDBENCH

➢ Framework that automates cloud-scale evaluation and benchmarking➢ Benchmark Harness

▪ Requests the Cloud Manager to create an instance(s)▪ Submit configuration plan and steps to the Cloud Manager on how the

test will be performed▪ At the end of the test, collect and log applicable performance data and

logs▪ Destroy Instances no longer needed for the test.

HARNESS AND WORKLOAD CONTROL

Benchmark Harness Cloud SUT

Group of boxes represents an application instance

Benchmark Harness. It comprises of CloudBench (CBTOOL) and baseline/elasticity drivers, and report generators.

For white-box clouds the benchmark harness is outside the SUT. For black-box clouds, it can be in the same location or campus.

BROWBEAT

Orchestration tool for existing OpenStack Workloads➢ Combines Workloads, Metrics, and Results into single tool➢ Runs Performance Workloads:

➢ Rally - Control Plane➢ Rally Plugins & Rally+pBench Plugins - Control+Data Plane➢ Shaker - Network Data Plane➢ PerfKitBenchmarker - Data Plane + Cloud Elasticity

➢ Provides Performance Infrastructure Installation and Configuration for➢ Carbon/Graphite/Grafana➢ Collectd➢ ELK➢ FluentD

➢ Provides dashboards for Visualizing and Comparing Results and System Performance Metrics

BROWBEAT - RESULTS

BROWBEAT - Metrics

SPEC CLOUD IAAS 2016 BENCHMARK

SPEC CLOUD IAAS 2016 BENCHMARK

➢ Measures performance of Infrastructure-as-a-Service (IaaS) Clouds➢ Measures both control and data plane

▪ Control: management operations, e.g., Instance provisioning time ▪ Data: virtualization, network performance, runtime performance

➢ Uses workloads that➢ resemble “real” customer applications➢ benchmarks the cloud, not the application

➢ Produces metrics (“elasticity”, “scalability”, “provisioning time”) which allow comparison

SPEC Cloud IaaS Benchmarking : Dell Leads the Way http://en.community.dell.com/techcenter/cloud/b/dell-cloud- blog/archive/2016/06/24/spec-cloud-iaas-benchmarking-dell-leads-the-way

http://en.community.dell.com/techcenter/cloud/b/dell-cloud-

Scalability and Elasticity AnalogyClimbing a mountain

18

c c

Scal

abilit

y –

conq

uerin

g an

infin

itely

high

mou

ntai

n

{

{

{

{

{

Elasticity – time for each step

IDEALScalability• Mountain: Keep on climbing• Cloud: keep on adding load without errorsElasticity• Mountain: Each step takes identical time• Cloud: performance within limits as load increases

{

{

{

WHAT IS MEASURED?

➢ Measures the number of AIs that can be loaded onto a Cluster before SLA violations occur

➢ Measures the scalability and elasticity of the Cloud under Test (CuT)

➢ Not a measure of Instance density➢ SPEC Cloud workloads can individually be used to stress the CuT:

▪ KMeans – CPU/Memory ▪ YCSB - IO

SPEC CLOUD BENCHMARK PHASESBaseline Phase

▪ Determine the results for a single application instance of a workload

▪ AI = stream of 5 runs

KMeans baseline AI

YCSB baseline AI

Elasticity Phase

Determine cloud elasticity andscalability results when multipleworkloads are run

BENCHMARK STOPPING CONDITIONS

➢ 20% AIs fail to provision➢ 10% AIs have errors in any run➢ Max number of AIs set by Cloud Provider ➢ 50% AIs have QoS violations

▪ KMeans completion time ≤ 3.33x Baseline phase ▪ YCSB Throughput ≥ Baselinethroughput / 3▪ YCSB Read Response Time ≤ 20 x BaselineReadResponse Time ▪ YCSB Insert Response Time ≤ 20 x BaselineInsertResponse Time

HIGH LEVEL REPORT SUMMARY

PUBLISHED RESULTS WEBSITE

https://www.spec.org/cloud_iaas2016/results/cloudiaas2016.html



PERFORMANCE MONITORING TOOLS

CEILOMETER

Source: http://docs.openstack.org/developer/ceilometer/architecture.html

Another familiar OpenStack project➢ https://wiki.openstack.org/wiki/Telemetry

➢ Goal is to efficiently collect, normalize and transform data produced by OpenStack services

➢ Interacts directly with the OpenStack services through defined interfaces

➢ Applications can leverage Ceilometer to gather OpenStack performance data

http://docs.openstack.org/developer/ceilometer/architecture.html#high-level-architectur

https://wiki.openstack.org/wiki/Telemetry


COLLECTD/GRAPHITE/GRAFANA

➢ Collectd➢ Daemon to collect System Performance

Statistics ➢ Plugins for CPU, Memory, Disk,

Network, Process, …➢ Graphite/Carbon

➢ Carbon receives metrics, and flushes them to whisper database files

➢ Graphite is webapp frontend to Carbon➢ Grafana

➢ Visualize metrics from multiple backends.

GANGLIA Ganglia is a scalable, distributed monitoring system for high-performance computing systems such as Server Nodes, Clusters and Grids.- Relatively easy to

setup- Tracks a lot

hardware-centric metrics

- Low operational burden

PERFORMANCE CHARACTERIZATION

PROVISIONING TIME: SPEC CLOUD

➢ The time needed to bring up a new instance, or add more resources (like CPU or storage) to an existing instance ➢ Instance: Time FROM request to create a new instance TO time when the instance responds to a netcat

probe on port 22.➢ Application instance: Time FROM request to create a new instance TO time when the AI reports readiness to

accept client requests.

➢ Provisioning Time Characterization using Baseline phase➢ Increase number of VMs (vary YCSB seeds KMeans and/or Hadoop slaves) and note impact on provisioning

time. ➢ vary instance configuration (flavor)

PCI-E LimitsFor PCI-E Gen-3 capable slots. http://www.tested.com/tech/457440-theoretical-vs- actual-bandwidth-pci-express-and-thunderbolt/

SAS LimitAn LSI whitepaper, Switched SAS: Sharable, Scalable SAS Infrastructurehttp://www.abacus.cz/prilohy/_5025/5025548/SAS_Switch_White%20Paper_US-EN_092210.pdf

IO LIMITS

NETWORK/IO CHARACTERIZATION

➢ Understand network utilization under load ➢ Management networks➢ Data networks (Neutron tenant)

➢ Monitor with Ganglia, collectd, Linux tools (vmstat, iostat etc)

➢ SPEC Cloud YCSB Baseline tests – Throughput (ops/s)➢ Vary number of Seeds➢ Increase number of YCSB records and

operations ➢ Increase number of YCSB threads

➢ CloudBench fio➢ CloudBench Netperf

CPU CHARACTERIZATION

➢ Use SPEC Cloud Baseline tests for CPU Characterization➢ Vary number of hadoopslaves➢ Increase sample size, number of dimensions, number of clusters

➢ Understand CPU utilization under load

➢ Monitor with Ganglia, collectd, graphana

➢ Linux tools (top, vmstat), SPEC Cloud, Kmeans

Note:✓ CPU user time✓ CPU system time✓ CPU iowait time✓ CPU irq time

SCALABILITY/ELASTICITY

➢Understand Scalability/Elasticity of the CuT ➢ SPEC Cloud Elasticity phase➢ Vary number of AIs➢ Monitor with FDR html report

TUNING TIPS

HARDWARE/OS TUNING

➢ Latest BIOS and Firmware revs ➢ Appropriate BIOS settings ➢ RAID/JBOD➢ Disk controller➢ NIC driver- Interrupt coalescing and affinitization ➢ NIC bonding➢ NIC jumbo frames➢ OS configuration settings

CLOUD TUNING

▪ HW/OS Tuning▪ Cloud Configs/Settings ▪ Workload tuning

INSTANCE CONFIGURATION

Performance is impacted by▪ Instance type (flavor) ▪ Number of Instances

OVER-SUBSCRIPTION

Beware of over-subscription !!!

LOCAL STORAGE

Use of local storage instead of shared storage (like Ceph) could improveperformance by over 50% ... depending on Ceph replication.

Source: OpenStack: Install and configure a storage node - OpenStackkilo.http://docs.OpenStack.org/kilo/install-guide/install/yum/content/cinder-install-storage-node.html (2015)

http://docs.openstack.org/kilo/install-guide/install/yum/content/cinder-install-storage-node.html

http://docs.openstack.org/kilo/install-guide/install/yum/content/cinder-install-storage-node.html

NUMA NODES

Pinning instance CPU to physical CPUs (NUMA nodes) on local storage further improves performance.

Source: Red Hat: Cpu pinning and numa topology awareness in OpenStackcompute.http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-OpenStack-compute/ (2015)

http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-OpenStack-compute/

http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-OpenStack-compute/

DISK PINNING

Disk Pinning shows a 15% performance improvement

Source: OpenStack: OpenStack cinder multibackend. https://wiki.OpenStack.org/wiki/Cinder-multi-backend (2015)

WORKER COUNT CONFIGURATIONAllow Services to use available resources with higher concurrency

Examples:Keystone Process CountNeutron WorkersGlance WorkersGnocchi API Workers

UNEVEN CONTROLLER USAGE

One controller had more cores available than the other two and ended up with all the jobs. This scenario was identified easily because the correct dashboarding was in place.

HEAT MEMORY USAGE

About 1GB of memory used by Heat for every 10 compute nodes deployed. Size your controller memory appropriately.

DEPLOYMENT TIMINGSOSPD 9 Overcloud Deployment

CONCLUSION

CONCLUSION

➢ Define what you are trying to measure ▪ Define a cloud▪ Define what metrics are important

➢ Use the correct tools ▪ Rally▪ PerfKitBenchmarker▪ Cloudbench▪ SPEC Cloud IaaS 2016 Benchmark ▪ Ceilometer▪ Collectd/Graphite/Grafana ▪ Ganglia▪ Browbeat

➢ Gather and analyze data▪ Apply tuning tips based on the data

PARTICIPATE!

ADDITIONAL INFORMATION

➢ Guidelines and Considerations for Performance and Scaling your Red Hat Enterprise Linux OpenStack Platform 6 Cloud

▪ https://access.redhat.com/articles/1507893➢ Guidelines and Considerations for Performance and Scaling your Red

Hat Enterprise Linux OpenStack Platform 7 Cloud▪ https://access.redhat.com/articles/2165131

➢ Red Hat OpenStack Blog▪ http://redhatstackblog.redhat.com/

➢ Red Hat Developer Blog▪ http://developerblog.redhat.com/

➢ Red Hat Enterprise Linux Blog▪ http://rhelblog.redhat.com/

Rally

Source: https://github.com/OpenStack/rally/blob/master/doc/source/images/Rally-Actions.png

Rally

Rally is a familiar OpenStack project

▪ https://github.com/OpenStack/rally

▪ An automated benchmark tool for OpenStack

Benchmarking

▪ Multiple use cases

• Development and QA• DevOps• CI/CD

https://github.com/OpenStack/rally

https://github.com/OpenStack/rally

PERFKIT BENCHMARKER


PERFKIT BENCHMARKER


PERFKIT BENCHMARKER


PERFKIT BENCHMARKER


BROWBEAT

➢ Scale and Performance automation ➢ Ansible playbooks for automation➢ Provides automation wrapper around existing tooling

➢ Rally - Control plane tests➢ Shaker - Data plane network tests ➢ Perfkit - Data plane tests

➢ Leverages existing upstream test frameworks rather than replacing them➢ Performance Monitoring

➢ Collectd/Graphite/Grafana➢ Results Capture/Storage/Analytics

➢ ELK stack➢ Allows for results comparison

BROWBEAT

BROWBEAT - RESULTS

COLLECTD/GRAPHITE/GRAFANAExample Grafana dashboards

UNEVEN CONTROLLER USAGE


HEAT MEMORY USAGE


DEPLOYMENT TIMINGS

Saw many instance reschedules with default scheduler. Deployment time dropped dramatically by setting up assignments via ironic.

DEFINING A CLOUD

Cloud = Private

Cloud = OpenStackCloud = Rain

Cloud = CumulusCloud = Public

Cloud = FunnelCloud = OpenShift

Cloud = Community Cloud = Cirrus

Ten different people will probably give you ten different answers

DEFINING A CLOUD


NIST SPECIAL PUBLICATION 800-145

Private cloud

The cloud infrastructure is provisioned for exclusive use by a single organizationcomprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

Public cloud

The cloud infrastructure is provisioned for open use by the general public. It may beowned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

Hybrid cloud

The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability(e.g., cloud bursting for load balancing between clouds).



DEFINING A CLOUD

SPEC OSG Cloud Subcommittee Glossary

Blackbox Cloud

A cloud-provider provides a general specification of the SUT, usually in terms of how the cloud consumer may be billed. The exact hardware detailscorresponding to these compute units may not be known. This will typically be the case if the entity benchmarking the cloud is different from a cloudprovider.

Whitebox Cloud

The SUT’s exact engineering specifications including all hardware and software are known and under the control of the tester. This will typically be the casefor private clouds.

Source: https://www.spec.org/cloud_iaas2016/docs/faq/html/glossary.html

http://www.spec.org/cloud_iaas2016/docs/faq/html/glossary.html

DEFINING A CLOUD

➢ The focus of this presentation will be predominantly on white box private cloud environments

➢ Primary example is OpenStack

➢ Many of the tools and methodologies are usable in the other cloud environments as well



SPEC RESEARCH GROUP - CLOUD WORKING GROUPhttps://research.spec.org/working-groups/rg-cloud-working- group.html

READY FOR RAIN? A VIEW FROM SPEC RESEARCH ON THE FUTURE OF CLOUD METRICS

https://research.spec.org/fileadmin/user_upload/documents/ rg_cloud/endorsed_publications/SPEC-RG-2016-01_CloudMetrics.pdf

THE DEGREE TO WHICH A SYSTEM IS ABLE TO ADAPT TO WORKLOAD CHANGES BY PROVISIONING AND DE-PROVISIONING RESOURCES IN AN AUTONOMIC MANNER, SUCH THAT AT EACH POINT IN TIME THE AVAILABLE RESOURCES MATCH THE CURRENT DEMAND AS CLOSELY AS POSSIBLE

Source: READY FOR RAIN? A VIEW FROM SPEC RESEARCH ON THE FUTURE OF CLOUD METRICS, SPEC RG CloudWorking Group

ELASTICITY

ELASTICITY

THE DEGREE TO WHICH A SYSTEM IS ABLE TO ADAPT TO WORKLOAD CHANGES BY PROVISIONING AND DE-PROVISIONING RESOURCES IN AN AUTONOMIC MANNER, SUCH THAT AT EACH POINT IN TIME THE AVAILABLE RESOURCES MATCH THE CURRENT DEMAND AS CLOSELY AS POSSIBLE

Source: READY FOR RAIN? A VIEW FROM SPEC RESEARCH ON THE FUTURE OF CLOUD METRICS, SPEC RG CloudWorking Group

ELASTICITY

ELASTICITY

Source: http://www.today.com/news/remember-stretch-armstrong-how-buy-your-favorite-retro-toys-your-1D8037 7927

HOW FAR WILL HE STRETCH? WILL HE BREAK WHEN STRETCHED?

AS YOU STRETCH HIM DOES IT GET HARDER TO STRETCH HIM MORE?

WHEN I LET GO DOES HE RETURN TO HIS ORIGINAL SHAPE?

HOW LONG DOES HE TAKE TO RETURN TO HIS NORMAL SHAPE?

http://www.today.com/news/remember-stretch-armstrong-how-buy-your-favorite-retro-toys-your-1D80377927

RESULTS COMPARED# Submissions 2 Submission ID Dell_12g Dell_13g CommentCloud Type Private / White

boxPrivate / White box

Hardware Platform 12g, 7xR720 Compute nodes

13g, 9xR630 Compute nodes

Job Date 03/05/2016 06/08/2016 Metrics Scalability@AIs 10.3@10 29.5@20 Higher is betterScalability per AI 1.03 1.45 Higher is betterElasticity 63.0% 71.9% Higher is betterInst. Prov. Time (s)

163 135 Lower is better

AI Prov. Success 100% 86.96% Higher is betterAI Run Success 100% 100% Higher is betterTotal Instances 65 131 Higher is betterBaseline Phase YCSB Throughput 13,082.6 17,742.0 Higher is betterKMeans Job time (s)

115.7 109.7 Lower is better

Elasticity Phase YCSB Throughput 9,480.9 14,890.8 Higher is betterKMeans Job time (s)

211.5 186.2 Lower is better

PERFKIT BENCHMARKER


BENCHMARK HARNESS

SURPORTED WORKLOADS

BROWBEAT

BROWBEAT

REPEATABLE AUTOMATED TESTING

PROVISIONING TIME: RALLY

79

Automated VM provisioningNova Success rate

Source: measuring the Cloud Using Rally & CloudBench, Douglas Shakshober, Red Hat Inc.

PCI-E LimitsFor PCI-E Gen-3 capable slots. (http://www.tested.com/tech/457440-theoretical-vs- actual-bandwidth-pci-express-and-thunderbolt/)

▪ Gen-3 is defined at 8 Gbps; this gives a bandwidth of 8.0 Gb/s (Scrambling +128b/130b encoding instead of 8b/10b encoding) per lane, so for example a PCI-E Gen-3 x8 link delivers an aggregate bandwidth of 8 GB/s

SAS LimitAn LSI whitepaper, Switched SAS: Sharable, Scalable SAS Infrastructure(http://www.abacus.cz/prilohy/_5025/5025548/SAS_Switch_White%20Paper_US-EN_092210.pdf) shows how to calculate the SAS limit of an 8 lane controller port with a SAS bandwidth of 6Gbps:

▪ Vary number of Seeds 6Gb/s x 8 lanes = 48Gb/s per x8 port▪ 48Gb/s (8b/10b encoding) = 4.8GB/sec per port (per node)▪ 4.8GB/s per port x 88.33% (arbitration delays and additional framing) = 4320MB/s per port

IO LIMITS

CEILOMETER: High-level Architecture

Source: http://docs.OpenStack.org/developer/ceilometer/architecture.html

http://docs.openstack.org/developer/ceilometer/architecture.html

CEILOMETER

Another familiar OpenStack project

➢ https://wiki.openstack.org/wiki/Telemetry

➢ Goal is to efficiently collect, normalize and transform data produced by OpenStack services

➢ Interacts directly with the OpenStack services through defined interfaces

➢ Many tools utilize Ceilometer to gather OpenStack performance data



COLLECTD/GRAPHITE/GRAFANA

➢ Collectd➢ Daemon to collect System Performance Statistics ➢ CPU, Memory, Disk, Network, Process, MariaDB, Load, Logged errors

and more➢ Graphite/Carbon

➢ Carbon receives metrics, and flushes them to whisper database files➢ Graphite is webapp frontend to carbon

➢ Grafana➢ Visualize metrics from multiple backends.

SPEC CLOUD WORKLOADS

YCSB

Framework used by a common set of workloads for evaluatingperformance of different key-value and cloud serving stores.

KMeans

-Hadoop-based CPU intensive workload

-Chose Intel HiBench implementation

Documents

Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used