31
Tom Adelmeyer, Principal Engineer, Intel Richard A. Brunner, Principal Engineer, VMware FUT3056BU #VMworld #FUT3056BU VMware vSphere Scales on the Amazing Next-Gen Intel Xeon Architecture VMworld 2017 Content: Not for publication or distribution

VMware vSphere Scales or distribution - RainFocus · VMware vSphere Scales ... Supported now by VMware vSphere 6.0.u3 and vSphere 6.5 ... How do I optimize and tune settings for best

  • Upload
    lythu

  • View
    293

  • Download
    0

Embed Size (px)

Citation preview

Tom Adelmeyer, Principal Engineer, IntelRichard A. Brunner, Principal Engineer, VMware

FUT3056BU

#VMworld #FUT3056BU

VMware vSphere Scales on the Amazing Next-Gen Intel Xeon Architecture

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

#FUT3056BU CONFIDENTIAL 2

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

1 Introduction to the Intel® Xeon® Scalable Processor

2 Modernize Your Datacenter

3 Simplify Your Datacenter

4 Accelerate Your Datacenter

5 Summary

#FUT3056BU CONFIDENTIAL 3

VMworld 2017 Content: Not fo

r publication or distri

bution

July 2017: Intel® Xeon® Scalable platformSupported now by VMware vSphere 6.0.u3 and vSphere 6.5

Delivers 1.65x average performance boost over prior Generation1

1 Up to 1.65x Geomean based on Normalized Generational Performance going from Intel® Xeon® processor E5-26xx v4 to Intel® Xeon® Scalable processor (estimated based on Intel internal testing of OLTP Brokerage, SAP SD 2-Tier, HammerDB, Server-side Java, SPEC*int_rate_base2006, SPEC*fp_rate_base2006, Server Virtualization, STREAM* triad, LAMMPS, DPDK L3 Packet Forwarding, Black-Scholes, Intel Distribution for LINPACK

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. . Configurations: see slides 32, 33

Simplify

Deployment / Provisioning

Modernize

TCO / Security

Accelerate

Compute, Storage, and Network

#FUT3056BU CONFIDENTIAL 4

VMworld 2017 Content: Not fo

r publication or distri

bution

Intel Processor Development*

Intel® Xeon® ProcessorIntel®

Xeon® Scalable Processor

2009 2017

Nehalem

45nm

New Micro-

architecture

Westmere

32nm

New Process

Technology

Sandy

Bridge

32nm

New Micro-

architecture

Ivy Bridge

22nm

New Process

Technology

Haswell

22nm

New Micro-

architecture

Broadwell

14nm

New Process

Technology

Skylake

14nm

New Micro-

architecture

“Purley” Platform“Grantley” Platform“Romley” Platform“Thurley” Platform

* Source: Intel, edited by VMware

Rapid Technology Innovation

#FUT3056BU CONFIDENTIAL 5

VMworld 2017 Content: Not fo

r publication or distri

bution

Intel® Xeon®

Platinum 8100Intel® Xeon®

Silver 4100Intel® Xeon®

Bronze 3100Intel® Xeon®

Gold 6100/5100

Day 0 Support by VMware vSphere 6.0.u3, 6.5.ga, and vSAN 6.2 and 6.6

#FUT3056BU CONFIDENTIAL 6

VMworld 2017 Content: Not fo

r publication or distri

bution

7

• Intel® AVX-512 with 32 DP flops per core

• Data center optimized cache hierarchy: 1MB L2 per core, non-inclusive L3

• New mesh interconnect architecture

• Enhanced memory subsystem

• Intel® Speed Shift Technology

• Optional Integrated Intel® Omni-Path Fabric (Intel® OPA)

• Modular IO with integrated devices

FeaturesIntel® Xeon® Processor

E5-2600 v4Intel® Xeon® Scalable

Processor

Cores Per Socket Up to 22 Up to 28

Threads Per Socket Up to 44 threads Up to 56 threads

Last-level Cache Up to 55 MB Up to 38.5 MB (non-inclusive)

QPI/UPI Speed (GT/s) 2x QPI channels @ 9.6 GT/s Up to 3x UPI @ 10.4 GT/s

PCIe* Lanes/ Controllers/Speed(GT/s)

40 / 10 / PCIe* 3.0 (2.5, 5, 8 GT/s)

48 / 12 / PCIe 3.0 (2.5, 5, 8

GT/s)

Memory Population4 channels up to 3 RDIMMs, LRDIMMs, or 3DS LRDIMMs

6 channels up to 2 RDIMMs, LRDIMMs, or 3DS LRDIMMs

Max Memory Speed Up to 2400 Up to 2666

TDP (W) 55W-145W 70W-205W

Core Core

Core Core

Core Core

Shared L3

UPI

UPI

2 or 3 UPI

6 Channels DDR4

48 Lanes

PCIe* 3.0

DMI3

DDR4

DDR4

DDR4

DDR4

DDR4

DDR4

UPI

Omni-Path HFIOmni-Path

Intel® Xeon® Scalable ProcessorRe-architected from the Ground Up (Code name “Skylake-SP”)

VMworld 2017 Content: Not fo

r publication or distri

bution

Breakthrough CPU Design: Intel® Mesh Architecture

8

✓ Maximizes performance

✓ Enables consistent, low latencies

✓ Optimized for data sharing and memory access between all CPU cores/threads for ideal memory

bandwidth and capacity

✓ Data flows scale efficiently for 2, 4 & 8+ socket configurations

✓ Designed for modern virtualized and hybrid cloud implementations

Designed for next-generation Data Centers

Ring Architecture Mesh Architecture

VMworld 2017 Content: Not fo

r publication or distri

bution

9

Intel® Mesh Architecture: Distributed Caching Home Agent • Intel® UPI caching and home agents are now distributed with each LLC bank

• Distributed CHA benefits

• Reduces traffic on mesh by eliminating home agent to LLC interaction

• Reduces latency by launching snoops earlier; obviates need for different snoop modes

Source: Intel

CHA / Core

CHA / Core

CHA / Core

CHA / Core

CHA / Core

CHA / Core

IMC 0 CHA / Core

CHA / Core

CHA / Core

CHA / Core

IMC 1

UPI0/1

PCIe-2(x16)

PCIe-3(x16)

MeshStop

MeshStop

PCIe-3(x16)

DMI

DIMM0 DIMM1

DIMM0 DIMM1

DIMM0 DIMM1

DIMM0DIMM1

DIMM0DIMM1

DIMM0DIMM1

Core

LLC: 1.375 MB

CHA

(10-core, 2 UPI Example)

VMworld 2017 Content: Not fo

r publication or distri

bution

10

Re-Architected L2 & LLC (Last-Level) Cache Hierarchy

Shared LLC

2.5MB/core

(inclusive)

Core

L2

(256KB

private)

Core

L2

(256KB

private)

Core

L2

(256KB

private)

Shared LLC

1.375MB/core

(non-inclusive)

Core

L2

(1MB

private)

Core

L2

(1MB

private)

Core

L2

(1MB

private)

Previous Architectures Intel® Xeon® Scalable Processor

• On-chip cache balance (Previous) shifted from shared-distributed (Skylake-SP):

• Shared-distributed shared-distributed LLC is primary cache

• Private-local private L2 becomes primary cache with shared LLC used as overflow cache

• Shared LLC changed from inclusive to non-inclusive:

• Inclusive (prior architectures) LLC has copies of all lines in L2

• Non-inclusive (Skylake architecture) lines in L2 may not exist in LLC

“Skylake-SP” cache hierarchy architected specifically for Data center use case

VMworld 2017 Content: Not fo

r publication or distri

bution

Modernize Your Datacenter

VMworld 2017 Content: Not fo

r publication or distri

bution

Modernize Your Datacenter: Lower Operational Expenses

• Considerations

– Deployment ease

– Server utilization

– Energy costs

– Space

• Up to 4.2x more VMs per server compared to 4 year old server1

• Up to 65% lower total cost of ownership2

– Upgrading from a 4 year old server to the Intel® Xeon® Scalable processor

– Reduced software and OS licensing fees, acquisition, maintenance and infrastructure costs

4-5 Year Old SystemIntel® Xeon® processor

E5-2690[Sandy Bridge: Launch Q1’12]

Intel® Xeon® Platinum

8180

processor

4.2 : 1consolidation1

Support More Virtual Machines Per Server

#FUT3056BU CONFIDENTIAL 12

VMworld 2017 Content: Not fo

r publication or distri

bution

Modernize Your DC w/ Intel® Solid State Drives NewFor Intel® Xeon® Scalable Processors

New line of Intel® 3D NAND

SATA SSDs with the same

rock-solid reliability, enterprise

RAS features, and consistent

performance

Packed with a deep feature

set, Intel® 3D NAND SSDs for

data centers are optimized for

the data caching needs of

cloud storage and software-

defined infrastructures.

Intel® Optane™ SSDs help

eliminate data center storage

bottlenecks and allows bigger,

more affordable data sets. Its

can accelerate applications,

and reduce transactions costs

for latency sensitive workloads

INTEL SOLID-STATE DRIVE

3D NANDIntel® SSD

DC P4600 SeriesTM

Intel® SSD

DC S3700 Series

3D NANDIntel® Optane™ SSD

DC P4800X SeriesTM

3D XPoint™ Technology

#FUT3056BU CONFIDENTIAL 13

VMworld 2017 Content: Not fo

r publication or distri

bution

The Right Solution with Intel® SSDs

Intel® SSD DC P4500 Series

Intel® SSD DC P4600 Series

DRAM + Intel® Optane™ SSD Intel® Memory Drive Technology SW

Memory

Expansion

Caching &

Fast Storage

Mainstream

Storage

Data Tiers Solution

Intel® Optane™SSD DC P4800X

New caching or fast storage tier for the most latency

sensitive applications

Accelerate cache tier and mixed workloads for faster

results and increased storage scaling

High performance, massively scalable storage

Bigger memory for new insights from larger

working sets

Dual Port versions of the Intel® Data Center SSD

family for PCIe

Hot tier and mixed workloads for Enterprise

High Availability Storage Solutions

Enterprise High Availability Storage Solutions

Dual Port versions of the Intel® Data Center SSD

family for PCIe

#FUT3056BU CONFIDENTIAL 14

VMworld 2017 Content: Not fo

r publication or distri

bution

Secure your DC with Intel® Trusted Execution Technology (Intel® TXT)

2. Hypervisor

measure

does not match

POSSIBLE

EXPLOIT! MATCH!

2. Hypervisor

measure matches+

Server with

TPM/Intel® TXT

3. OS and

applications

are launched,

known trusted

3. Policy action

enforced,

known untrusted

1. System powers on and

Intel® TXT verifies

BIOS/OS

+

OSAPPS

• System boot stack gets crypto-

hashed before execution

• Hash values get safely stored in a

Trusted Platform Module (TPM)

• Match to known-good values

determines system trust status

• One-Touch Activation: OOB

TXT/TPM remote discovery,

enablement, activation independent

of OEM/OS

NEW

#FUT3056BU CONFIDENTIAL 15

VMworld 2017 Content: Not fo

r publication or distri

bution

Simplify Your Datacenter

VMworld 2017 Content: Not fo

r publication or distri

bution

Simplifying Deployment

For any specific workload or application that I manage ….

How do I get everything

integrated and working well

together?

What are the best

hardware and

software components

to deploy?

How do I optimize and

tune settings for best

performance?

How can I go

faster?

What happens when

something

changes?

?

?

?

?

?#FUT3056BU CONFIDENTIAL 17

VMworld 2017 Content: Not fo

r publication or distri

bution

Simplify Deployment with Intel® Select Solutions

Tightly-specified HW and SW components, eliminating guesswork

Simplified

evaluation

Designed and benchmarked to perform optimally for specific workloads

Workload

optimized

Pre-defined settings and system-wide tuning, enabling smooth deployment

Fast & easy

to deploy

All Intel® Select Solution configurations and benchmark results are

verified by Intel

Intel® Select

Solutions for

VMware vSAN

Visit www.intel.com/selectsolutions#FUT3056BU CONFIDENTIAL 18

VMworld 2017 Content: Not fo

r publication or distri

bution

Accelerate Your Applications

VMworld 2017 Content: Not fo

r publication or distri

bution

Delivering Performance Beyond Benchmarks

#FUT3056BU CONFIDENTIAL 20

VMworld 2017 Content: Not fo

r publication or distri

bution

Intel® Advanced Vector Extensions-512 (AVX-512)

21

• AVX-512 performance:

– Achieves more work per cycle (doubles width of data registers versus Prior CPU)

– Minimizes latency & overhead (doubles the number of registers)

– 2x FMA processing engines are available on Intel® Xeon® Platinum and Intel® Xeon® Gold

+

=

x7 + y7 x6 + y6 x5 + y5 x4 + y4 x3 + y3 x2 + y2 x1 + y1 x0 + y0

y7 y6 y5 y4 y3 y2 y1 y0

x7 x6 x5 x4 x3 x2 x1 x0

AVX-512 (512-bit)

SSE (128-bit)AVX (256-bit)

double *x,*y, *z;

for (i=0; i<N; i++) { z[i] = x[i] + y[i]; }

Vectorized loop using SSE, AVX, and AVX-512

Family Instruction SetSP FLOPs

/ cycleDP FLOPs

/ cycle

SkylakeIntel® AVX-512 &

FMA64 32

Haswell / Broadwell

Intel AVX2 & FMA 32 16

Sandybridge Intel AVX (256b) 16 8

Nehalem SSE4.2 (128b) 8 4

Accelerates performance for demanding computational tasks

669

1178

2034

3259

0

500

1000

1500

2000

2500

3000

3500

SSE4.2 AVX AVX2 AVX512

GF

LO

Ps

LINPACK Performance1https://www.codeproject.com/Articles/1182515/Vectorization-Opportunities-for-Improved-Perform

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. (1) Source as of June 2017: Intel internal measurements on platform with Xeon Platinum 8180, Turbo enabled, UPI=10.4, SNC1, 6x32GB DDR4-2666 per CPU, 1 DPC

VMworld 2017 Content: Not fo

r publication or distri

bution

VMmark 3.0 Improvement

Xeon ~19% Improvement From

Previous Xeon

VMmark 3.0 Results CPU Configuration

Intel Xeon Platinum 8180,

2-socket, 56-cores/112-threads,

2.5 GHz

Intel Xeon E5-2699A v4,

2-socket, 44-cores / 88-threads,

2.4 GHz

For two very similar high-performance 2-socket servers

New Intel Xeon

Platinum 8180

Previous Intel

Xeon E5-2699A v4

~19% VMmark 3.0

Improvement Over

www.vmware.com/products/vmmark/results3x.html

#FUT3056BU CONFIDENTIAL 22

VMworld 2017 Content: Not fo

r publication or distri

bution

Accelerate VMware vSAN with Intel® 3D NAND SSDs

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. System Configuration: 4 Node vSAN* Cluster. Per Node configuration: Supermicro* SuperServer 2028U-TN24R4T+ Dual Intel® Xeon® E5-2687Wv4 (12 Core @ 3.0 Ghz), Supermicro* Server Board, 256 GB DDR4 RAM, Boot Drive, 1x Intel® SSD DC S3710 Series (200 GB, 2.5”), vSAN Intel 3D NAND Cluster: Virtual SAN SSDs - 2 Disk Groups comprised of 2x Intel® SSD DC P4600 Series (1.6TB, 2.5” SFF), 8x Intel® SSD DC P4500 Series (4 TB, 2.5” SFF), vSAN Intel 2D NAND Cluster: Virtual SAN SSDs - 2 Disk Groups comprised of 2x Intel® SSD DC P3700 Series (800GB, 2.5” SFF), 8x Intel® SSD DC P3500 Series (2 TB, 2.5” SFF), Intel® Ethernet Server Adapter X540-DA2

vSAN NodeIntel® 2D NAND SSDs

Server: 2 x Intel® Xeon® E5 2xup to

63%up to

Increase Storage

Capacity

Lower Cost/GB

Transition

to Intel®

3D NAND

SSDs™

Cache Tier: 2x Intel® SSD DC P3700 @ 800 GB

Capacity Tier: 8x Intel® SSD DC P3500 @ 2TB

vSAN NodeIntel® 3D NAND SSDs

Cache Tier: 2x Intel® SSD DC P4600 @ 1.6 TB

Capacity Tier: 8x Intel® SSD DC P4500 @ 4TB

6%up toIncrease

IOPs

Increased Storage Capacity & Efficiency with Lower Cost

Server: 2 x Intel® Xeon® E5

#FUT3056BU CONFIDENTIAL 23

VMworld 2017 Content: Not fo

r publication or distri

bution

Intel S2600WF (Wolf Pass)

Per-socket:

2 Memory Controllers,

6 Memory Channels

12 DIMM slots

4 Occulinks for

NVMe U.2 drives

Intel “Purley” Design Recommendation: Support a minimum of four U.2 NVMe SSDs in cabled PCIe topology directly from CPU

#FUT3056BU CONFIDENTIAL 24

VMworld 2017 Content: Not fo

r publication or distri

bution

• VMD is a mode of the Skylake-SP integrated PCIe Root Complex. It requires a Special VMD Driver.

• VMD allows vSAN to support NVMe Hotplug and LED management

– VMD intercepts all PCIe hotplug events

– Hotplug looks like SAS LUN add/subtract

– LED management uses a script or ESXcli tool

• vSphere ESXi sees only a VMD PCIe “HW RAID” device and loads the VMD driver

• VMD driver for ESXi scans PCI, finds the nvme controllers and enumerate the nvme devices

– Performed by VMD driver built into NVMe driver

• VMD driver exposes each NVMe devices’ namespace as a SCSI LUN on the VMD controller

– Similar to the LUNs on a SAS controller

– Only the VMD driver knows how to access the NVMe controller memory and config space

Intel “Skylake-SP” Volume Management Device (VMD)

NVMe

SSD

NVMe

SSDNVMe

SSD

NVMe

SSD

PCIe ConfigErrors & Events I/OIntel VMD HW

Switch

vSphere Storage Interface

vSphere Storage Interface

Intel VMD Driver

PCIe Domain Mgmt

Intel NVMeDriver

PCIeConfig

I/O

Block I/O

#FUT3056BU CONFIDENTIAL 25

VMworld 2017 Content: Not fo

r publication or distri

bution

Intel® QuickAssist Technology (Intel® QAT)

• Intel® QAT is a set of scalable hardware accelerators first introduced in 2012.

• Intel® QAT offers higher performance than software for:

– Symmetric cryptography functions including cipher operations and authentication operations

– Public key functions including RSA, Diffie-Hellman, and elliptic curve cryptography

– Compression and decompression functions including DEFLATE and LZS

• Intel® QAT on “Skylake-SP” offers outstanding capabilities:

– 100Gbs Crypto, 100Gbs Compression, 100kops RSA , 2k Decrypt.

• Offered in two forms:

– Integrated in a SKU of the C620 Series Platform Chipset (“Lewisburg”), and

– Add-in PCIe card using the same “Lewisburg” chipset.

26

HOW IS IT OFFERED?

Chipset

Option

PCIe* Card Option

www.intel.com/content/www/us/en/embedded/technology/quickassist/overview.html

#FUT3056BU CONFIDENTIAL 26

VMworld 2017 Content: Not fo

r publication or distri

bution

Accelerate with Intel® QuickAssist Technology

27Revision 0.6

0102030405060708090

100110

RSA 2KDecrypt(kOps/s)

IPSecForwarding

(Gbps)

SSLWebProxy

(Gbps)

Software-based OpenSSL with Intel® QAT

Security Benchmarks

1. NGINX* and OpenSSL* connections/second. Conducted by Intel Applications Integration Team. Claim is actual performance measurement. Intel® microprocessor. Processor: Intel® Xeon® processor Scalable family with C6xxB0 ES2. Performance tests use cores from a single CPU, Memory configuration:, DDR4–2400. Populated with 1 (16 GB) DIMM per channel, total of 6 DIMMs. Intel® QuickAssist Technology driver: QAT1.7.Upstream.L.0.8.0-37 Fedora* 22 (Kernel 4.2.7) BIOS: PLYDCRB1.86B.0088.D09.16060117363. Cloudera* 5.4.2 with Snappy* Software vs. Intel® QuickAssist Technology hardware solution. Conducted by Intel Applications Integration Team. Claim is actual performance measurement. Intel® Xeon® processor E5-2699 v4 (56 cores enabled) 256 GB DDR4 1.6 TB NVMe SSD 1 Intel® C6xxx-based card (24x). 10 Gbps CentOS* 6.7 w/ 2.6.32 kernel Cloudera* 5.4.2; QAT driver 0.9.1 Snappy* 1.1.2 (popular, fast compression codec); One NameNode Eight DataNodes 10 Gbps network2. 24 Core Intel(r) Xeon Scalable Platform -SP @1.8GHz, Single (UP) Processor configuration. Intel(r) C627 PCH with crypto acceleration capability (in x16 mode) Neon City platform. DDR4 2400MHz RDIMMs 6x16GB(total 96 GB), 6 Channels, 1 x Intel®Corporation Red Rock Canyon 100GbE Ethernet Switch in the x16 PCIe slot on Socket 0. 8 cache ways allocated for DDIO.

1 13

VMworld 2017 Content: Not fo

r publication or distri

bution

Broad VMware Support For The Intel® Xeon® Scalable Processor

• Long History of Collaborating Together

• VMware vSphere 6.5 and 6.0.u3 provided Day 0 (launch) Support For Servers using the Intel®

Xeon® Scalable Processor.

– vSphere 6.5 supports using AVX-512 in a Virtual Machine.

• 56 Server Designs were listed on Day 0 of launch.

– Currently at 103 Server Designs across 15 Server OEMs

• vSAN 6.6 provided support for Ready Node Models using the Intel Xeon® Scalable® Processor running VMware vSphere 6.5.u1 and 6.0.u3

– 30 vSAN Ready Node Models listed today

#FUT3056BU CONFIDENTIAL 28

VMworld 2017 Content: Not fo

r publication or distri

bution

Summary: Time to Upgrade!

29

Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and

MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your

contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance/datacenter. Configurations: see slide 35. *Other names and brands may be claimed as the property of others.

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution