17
RE-ARCHITECTING THE DATACENTER: INTRODUCING THE INTEL ® XEON ® PROCESSOR SCALABLE FAMILY SUMNER LEMON D I R E C T O R , GOVERNMENT & PUBLIC SECTOR SALES INTEL CORPORATION ASIA/PACIFIC & JAPAN

RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

Embed Size (px)

Citation preview

Page 1: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

RE-ARCHITECTING THE DATACENTER: INTRODUCING THE INTEL® XEON® PROCESSOR SCALABLE FAMILY

S U M N E R L E M O N D I R E C T O R , GOVERNMENT & PUBLIC SECTOR SALES I N T E L C O R P O R A T I O N A S I A / P A C I F I C & J A P A N

Page 2: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

CLOUD ECONOMICS INTELLIGENT DATA PRACTICES NETWORK TRANSFORMATION

Page 3: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

INFRASTRUCTURE TRANSFORMATION

Performance Security

Agility

REQUIRES

Page 4: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step
Page 5: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

INTEL® XEON® SCALABLE PLATFORM

THE INDUSTRY’S

IN A DECADEBIGGEST PLATFORM ADVANCEMENT

Page 6: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

INTEL® XEON® SCALABLE PLATFORM DELIVERS

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. 1.  Geomean based on Normalized Generational Performance going from Intel® Xeon® processor E5-26xx v4 to Intel® Xeon® Scalable processor (estimated based on Intel internal testing of OLTP Brokerage, SAP SD 2-Tier, HammerDB, Server-side Java, SPEC*int_rate_base2006, SPEC*fp_rate_base2006, Server Virtualization, STREAM* triad, LAMMPS, DPDK L3 Packet Forwarding, Black-Scholes, Intel Distribution for LINPACK. 2.  2X gains in Reed Solomon Erasure Code: Intel Xeon® Processor Scalable Family: Platinum 8180 Processor, 28C, 2.5 GHz, H0, Neon City CRB, 12x16 GB DDR4 2666 MT/s ECC RDIMM, BIOS PLYCRB1.86B.0128.R08.1703242666. Intel® Xeon® E5-2600v4 Series Processor, E5-2650 v4, 12C, 2.2 GHz, Aztec City CRB, 4x8 GB DDR4 2400 MT/s ECC RDIMM, BIOS GRRFCRB1.86B.0276.R02.1606020546. Operating System: Redhat Enterprise Linux 7.3, Kernel 4.2.3, ISA-L 2.18, BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step: Disabled, C-States: Disabled, ENERGY_PERF_BIAS_CFG: PERF. 3.  Up to 4.28x more VMs based on server virtualization consolidation workload: Based on Intel® internal estimates 1-Node, 2 x Intel® Xeon® Processor E5-2690 on Romley-EP with 256 GB Total Memory on VMware ESXi* 6.0 GA using Guest OS RHEL6.4, glassfish3.1.2.2, postgresql9.2. Data Source: Request Number: 1718, Benchmark: server virtualization consolidation, Score: 377.6 @ 21 VMs vs. 1-Node, 2 x Intel® Xeon® Platinum 8180 Processor on Wolf Pass SKX with 768 GB Total Memory on VMware ESXi6.0 U3 GA using Guest OS RHEL 6 64bit. Data Source: Request Number: 2563, Benchmark: server virtualization consolidation, Score: 1580 @ 90 VMs. Higher is better 4.  Up to 65% lower 4-year TCO estimate example based on equivalent rack performance using VMware ESXi* virtualized consolidation workload comparing 20 installed 2-socket servers with Intel Xeon processor E5-2690 (formerly “Sandy Bridge-EP”) running VMware ESXi* 6.0 GA using Guest OS RHEL6.4 compared at a total cost of $919,362 to 5 new Intel® Xeon® Platinum 8180 (Skylake) running VMware ESXi6.0 U3 GA using Guest OS RHEL 6 64bit at a total cost of $320,879 including basic acquisition. Server pricing assumptions based on current OEM retail published pricing for 2-socket server with Broadwell based Intel Xeon processor systems– subject to change based on actual pricing of systems

offered.

Performance Agility Security

65% LOWER TOTAL COST

OF OWNERSHIP VS 4-YEAR OLD SERVER4

4.2X GREATER

VM CAPACITY VS 4-YEAR-OLD SERVER3

UP TO

2X DATA PROTECTION

PERFORMANCE G E N O V E R G E N 2

UP TO

1.65X AVERAGE

GENERATIONAL GAINS1

Page 7: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

PERFORMANCE

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. 1.  Up to 2X flops per cycle: when compared with Intel® AVX2 available on previous E5/E7 v3, v4 families

Intel® AVX-512 N E W L E V E L O F

V E C TO R P E R F O R M A N C E UP TO

2X FLOPS PER

CLOCK CYCLE1

Intel® Mesh Architecture

OPTIMIZED CORES, CACHE, MEMORY, I/O

UP TO

100GB/S ENCRYPTION @1K PACKETS

UP TO

100GB/S COMPRESSION

@8K BLOCK

Intel® QuickAssist Technology

Page 8: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

INTEL® MESH ARCHITECTURE

Video

Page 9: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step
Page 10: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

Other names and brands may be claimed as the property of others. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks/datacenter. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. 1. XXX world records as of July 11, 2017. Full details available at: www.intel.com/xxxxx

58 ESTABLISHED TODAY

Page 11: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

HPE HAS 16 OF THE WORLD RECORDS

& COUNTING…

Source: https://www.intel.com/content/www/us/en/benchmarks/server/xeon-scalable/xeon-platinum-world-record.html

Page 12: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

Security Without

Compromise

SECURITY

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. 1.  2X gains in Reed Solomon Erasure Code: Intel Xeon® Processor Scalable Family: Platinum 8180 Processor, 28C, 2.5 GHz, H0, Neon City CRB, 12x16 GB DDR4 2666 MT/s ECC RDIMM, BIOS PLYCRB1.86B.0128.R08.1703242666. Intel® Xeon® E5-2600v4 Series Processor, E5-2650 v4, 12C, 2.2 GHz, Aztec City CRB, 4x8 GB DDR4 2400 MT/s ECC RDIMM, BIOS GRRFCRB1.86B.0276.R02.1606020546. Operating System: Redhat Enterprise Linux 7.3, Kernel 4.2.3, ISA-L 2.18, BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step: Disabled, C-States: Disabled, ENERGY_PERF_BIAS_CFG: PERF. 2.  BigBench query Runtime/second. Testing done by Intel. BASELINE: Platform 8168, NODES 1 Mgmt + 6 Workers, Make Intel Corporation, Model S2600WFD, Form Factor 2U, Processor Intel(R) Xeon(R) Platinum 8168 processor, Base Clock 2.70 GHz, Cores per socket 24, Hyper-Threading Enabled, NUMA mode Enabled, RAM 384GB DDR4, RAM Type 12x 32GB DDR4, OS Drive Intel® SSD DC S3710 Series (800GB, 2.5in SATA 6Gb/s, 20nm, MLC), Data Drives 8x - Seagate Enterprise 2.5 HDD ST2000NX0403 2TB, Intel® SSD DC P3520 Series (2.0TB), Temp Drive DC 3520 2TB, NIC Intel X722 10GBE - Dual Port, Hadoop Cloudera 5.11, Benchmark BigBench, Operating System CentOS Linux release 7.3.1611 (Core); HDFS

encryption turned OFF. vs. NEW: Platform 8168, NODES 1 Mgmt + 6 Workers, Make Intel Corporation, Model S2600WFD, Form Factor 2U, Processor Intel(R) Xeon(R) Platinum 8168 processor, Base Clock 2.70 GHz, Cores per socket 24, Hyper-Threading Enabled, NUMA mode Enabled, RAM 384GB DDR4, RAM Type 12x 32GB DDR4, OS Drive Intel® SSD DC S3710 Series (800GB, 2.5in SATA 6Gb/s, 20nm, MLC), Data Drives 8x - Seagate Enterprise 2.5 HDD ST2000NX0403 2TB, Intel® SSD DC P3520 Series (2.0TB), Temp Drive DC 3520 2TB, NIC Intel X722 10GBE - Dual Port, Hadoop Cloudera 5.11, Benchmark BigBench, Operating System CentOS Linux release 7.3.1611 (Core); HDFS encryption turned ON

Protect the Data

INTEL® KEY PROTECTION TECHNOLOGY

PROTECT KEYS FROM SOFTWARE ATTACKS

Secure the Platform

HARDWARE ROOT OF TRUST

0.37% ENCRYPTION

PERFORMANCE OVERHEAD2

UP TO

2X DATA PROTECTION

PERFORMANCE G E N O V E R G E N 1

Page 13: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

AGILITY

Advanced RAS Features

Artificial Intelligence

Intel® Volume Management

Device WITH INTEL® OPTANE™ SSDs

Enhanced Virtualization

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.

WITH MODE BASED EXECUTION

Page 14: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

MOST AGILE, SCALABLE AI PLATFORM

Real-time workloads, like inferencing,

run on Xeon

Potent Performance

Production Ready

Built-in ROI

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. 1.  Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Compared with Platform: 2S Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz (18 cores), HT enabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 256GB

DDR4-2133 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.el7.x86_64. OS drive: Seagate* Enterprise ST2000NX0253 2 TB 2.5" Internal Hard Drive.Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=36, CPU Freq set with cpupower frequency-set -d 2.3G -u 2.3G -g performance. Intel Caffe: (http://github.com/intel/caffe/), revision b0ef3236528a2c7d2988f249d347d5fdae831236. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), GCC 4.8.5, MKLML version 2017.0.2.20170110. BVLC-Caffe: https://github.com/BVLC/caffe, Inference & Training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was st ored on local storage and cached in memory before training BVLC Caffe (http://github.com/BVLC/caffe), revision 91b09280f5233cafc62954c98ce8bc4c204e7475 (commit date 5/14/2017). BLAS: atlas ver. 3.10.1.

2.  Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Compared with Platform: 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores), HT enabled, turbo disabled, scaling governor set to “performance” via acpi-cpufreq driver, 256GB DDR4-2133 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance. Neon: ZP/MKL_CHWN branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d. Dummy data was used. The main.py script was used for benchmarking , in mkl mode. ICC version used : 17.0.3 20170404, Intel MKL small libraries version 2018.0.20170425; Inference and training throughput uses FP32 instructions

cut training time from days to hours

up to

113X performance with

optimized software V S I N T E L X E O N E 5 V 3 1

up to

2.2X deep learning performance

V S P R I O R G E N 2

Page 15: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

GEN10 & XEON® SCALABLE FAMILY

Page 16: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step

IN SUMMARY…Video

Page 17: RE-ARCHITECTING THE DATACENTER: … THE DATACENTER: INTRODUCING THE INTEL ... STREAM* triad, LAMMPS, DPDK L3 ... BIOS Configuration, P-States: Disabled, Turbo: Disabled, Speed Step