21
Evaluation of AMD EPYC Chris Hollowell <[email protected]> HEPiX Fall 2018, PIC Spain

Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

Evaluation of AMD EPYCChris Hollowell <[email protected]>

HEPiX Fall 2018, PIC Spain

Page 2: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

2

What is EPYC?

EPYC is a new line of x86_64 server CPUs from AMD based on their Zen microarchitecture

Same microarchitecture used in their Ryzen desktop processorsReleased June 2017

First new high performance series of server CPUs offeredby AMD since 2012

Last were Piledriver-based OpteronsSteamroller Opteron products cancelled

AMD had focused on low power server CPUs instead

x86_64 Jaguar APUsARM-based Opteron A CPUs

Many vendors are now offering EPYC-based servers, including Dell, HP and Supermicro

Page 3: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

3

How Does EPYC Differ From Skylake-SP?

Intel’s Skylake-SP Xeon x86_64 server CPU line also released in 2017

Both Skylake-SP and EPYC CPU dies manufactured using 14 nm process

Skylake-SP introduced AVX512 vector instruction support in XeonAVX512 not available in EPYCHS06 official GCC compilation options exclude autovectorizationStock SL6/7 GCC doesn’t support AVX512

Support added in GCC 4.9+ Not heavily used (yet) in HEP/NP offline computing

Both have models supporting 2666 MHz DDR4 memorySkylake-SP

6 memory channels per processor3 TB (2-socket system, extended memory models)

EPYC8 memory channels per processor4 TB (2-socket system)

Page 4: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

4

How Does EPYC Differ From Skylake (Cont)?Some Skylake-SP processors include built in Omnipath networking,or FPGA coprocessors

Not available in EPYC

Both Skylake-SP and EPYC have SMT (HT) support2 logical cores per physical core (absent in some Xeon Bronze models)

Maximum core count (per socket)Skylake-SP – 28 physical / 56 logical (Xeon Platinum 8180M) EPYC – 32 physical / 64 logical (EPYC 7601)

Maximum socket countSkylake-SP – 8 (Xeon Platinum)EPYC – 2

Processor InteconnectSkylake-SP – UltraPath Interconnect (UPI)EYPC – Infinity Fabric (IF)

PCIe lanes (2-socket system)Skylake-SP – 96EPYC – 128 (some used by SoC functionality)

Same number available in single socket configuration

Page 5: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

5

EPYC: MCM/SoC Design

EPYC utilizes an SoC designMany functions normally found in motherboardchipset on the CPU

SATA controllersUSB controllers etc.

Each EPYC processor consists of four CPU dies,interconnected via Infinity Fabric

Multi-Chip Module (MCM) architecture”CPU Complexes” (CCX)

Each CCX attached to its own memory2 memory channels per CCX

All Skylake-SP cores are on a single die

AMD claims MCM results in a cost reduction by improving yields

Believed to scale better than monolithic die approach as core counts continueto increaseDrawback: higher memory latency for non-NUMA-aware applications

Page 6: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

6

EPYC: MCM/SoC Design (Cont.)# lscpuArchitecture: x86_64CPU op-mode(s): 32-bit, 64-bitByte Order: Little EndianCPU(s): 64On-line CPU(s) list: 0-63Thread(s) per core: 2Core(s) per socket: 16Socket(s): 2NUMA node(s): 8Vendor ID: AuthenticAMDCPU family: 23Model: 1Model name: AMD EPYC 7351 16-Core ProcessorStepping: 2CPU MHz: 2400.000CPU max MHz: 2400.0000CPU min MHz: 1200.0000BogoMIPS: 4799.41Virtualization: AMD-VL1d cache: 32KL1i cache: 64KL2 cache: 512KL3 cache: 8192KNUMA node0 CPU(s): 0-3,32-35NUMA node1 CPU(s): 4-7,36-39NUMA node2 CPU(s): 8-11,40-43NUMA node3 CPU(s): 12-15,44-47NUMA node4 CPU(s): 16-19,48-51NUMA node5 CPU(s): 20-23,52-55NUMA node6 CPU(s): 24-27,56-59NUMA node7 CPU(s): 28-31,60-63

# lscpuArchitecture: x86_64CPU op-mode(s): 32-bit, 64-bitByte Order: Little EndianCPU(s): 72On-line CPU(s) list: 0-71Thread(s) per core: 2Core(s) per socket: 18Socket(s): 2NUMA node(s): 2Vendor ID: GenuineIntelCPU family: 6Model: 85Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHzStepping: 4CPU MHz: 2700.000BogoMIPS: 5404.41Virtualization: VT-xL1d cache: 32KL1i cache: 32KL2 cache: 1024KL3 cache: 25344KNUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71

EPYC vs Skylake-SP (SNC Disabled) NUMA Configuration

Page 7: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

7

Socket LGA 3647 & SP3

Both CPUs/sockets are quite large

Visible quadrants in the SP3 socket for the four CPU complexes in the EPYC processor

Skylake SP – Socket LGA 3647 EPYC – Socket SP3

Page 8: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

8

Skylake and EPYC Model Lineup ComparisonModel Base Frequency Cores SMT TDP Memory Retail

Xeon Bronze 3104 1.7 GHz (no turbo) 6 No 85W 2133 MHz DDR4 $213

Xeon Silver 4110 2.1 GHz 8 Yes 85W 2400 MHz DDR4 $501

Xeon Gold 5115 2.4 GHz 10 Yes 85W 2666 MHz DDR4 $1,221

Xeon Gold 6130 2.1 GHz 16 Yes 125W 2666 MHz DDR4 $1,900

Xeon Gold 6136 3.0 GHz 12 Yes 150W 2666 MHz DDR4 $2,460

Xeon Gold 6148 2.4 GHz 20 Yes 150W 2666 MHz DDR4 $3,072

Xeon Gold 6150 2.7 GHz 18 Yes 165W 2666 MHz DDR4 $3,358

Xeon Platinum 8170 2.1 GHz 28 Yes 165W 2666 MHz DDR4 $7,405

Xeon Platinum 8180M 2.5 GHz 28 Yes 205W 2666 MHz DDR4 $13,011

EPYC 7251 2.1 GHz 8 Yes 120W 2400 MHz DDR4 $475

EPYC 7351 2.4 GHz 16 Yes 170W 2666 MHz DDR4 $1,110Uniprocessor (P) - $750

EPYC 7401 2.0 GHz 24 Yes 170W 2666 MHz DDR4 $1,850Uniprocessor (P) - $1,075

EPYC 7451 2.3 GHz 24 Yes 180W 2666 MHz DDR4 $2,400

EPYC 7551 2.0 GHz 32 Yes 180W 2666 MHz DDR4 $3,400

EPYC 7601 2.2 GHz 32 Yes 180W 2666 MHz DDR4 $4,200

Page 9: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

9

EPYC vs Skylake-SP: HEP/NP PerformanceBenchmarks

HEPSPEC06“all_cpp” subset of SPEC-CPU2006 run in parallel

CERN Cloud Benchmark SuiteVarious benchmarks, run in parallel

DB12WhetstoneATLAS KV

Unless noted, memory configured to utilize all 8 channels per CPU on EPYC, and 6 channels per CPU for Skylake-SP, with at least 2 GB RAM/logical core

~11% HS06 performance degradation seen for EPYC 7441 whenonly populating half of the memory channelsAll 2666 MHz DDR4

Noted dual rank (DR) DIMMs downclocked to 2400 MHz for EPYC

All run under SL/CentOS/RHEL 7

SMT/Hyperthreading enabled, unless otherwise indicated

Systems are dual CPU, unless noted

Page 10: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

10

EPYC HEPSPEC06: SMT Off vs On

CPU0

200

400

600

800

1000

1200

1400

368

489541

780

883

1101

872

1148

1078

1296

EPYC [email protected] GHz [Uniprocessor - 24 threads]EPYC [email protected] GHz [Uniprocessor - 48 threads]

EPYC [email protected] GHz [32 threads]EPYC [email protected] GHz [64 threads]EPYC [email protected] Ghz [48 threads] DDR-2400EPYC [email protected] Ghz [96 threads] DDR-2400EPYC [email protected] GHz [64 threads]EPYC [email protected] GHz [128 threads]EPYC [email protected] GHz [64 threads] DDR4-2400EPYC [email protected] GHz [128 threads] DDR4-2400

EP

YC

7401P

SM

T

EP

YC

7351

EP

YC

7351 S

MT

EP

YC

7451 S

MT EP

YC

7551

EP

YC

7551 S

MT

EP

YC

7601

EP

YC

7601 S

MT

EP

YC

7451

HS06

EP

YC

7401P

25%+ HS06 performance improvement with SMT (“hyperthreading”) enabled

Page 11: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

11

EPYC vs Skylake-SP: HEPSPEC06

CPU0

200

400

600

800

1000

1200

1400

394

729

790

10681035

1261

489

780

11011148

1296

Xeon Gold [email protected] GHz [40 threads] +Xeon Gold [email protected] GHz [64 threads]Xeon Gold [email protected] GHz [48 threads]Xeon Gold [email protected] GHz [80 threads]Xeon Gold [email protected] GHz [72 threads]Xeon Platinum [email protected] GHz [104 threads] *EPYC [email protected] GHz [Uniprocessor - 48 threads]EPYC [email protected] GHz [64 threads]EPYC [email protected] Ghz [96 threads] DDR-2400

EPYC [email protected] GHz [128 threads]EPYC [email protected] GHz [128 threads] DDR4-2400

Xeon

Gold 61 3

0

Xeon

Gold 61 3

6

Xeon

Gold 61 4

8

EP

YC

7401P

Xeon

Platinu

m 817

0

EP

YC

7351

EP

YC

7451

EP

YC

7551

EP

YC

7601

Xeon

Gold 61 5

0

HS06

+ = System using only 3 memory channels per CPU* = Value reported by CERN

Xe

on Gold 51

15

Page 12: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

12

EPYC vs Skylake: HEPSPEC06 (Cont.)

Larger values are better

Similar maximum HS06 (~1,275) performance for the models testedData for highest level EPYC (7601), but not highest model Skylake-SP (8180M)Can assume Xeon Skylake 8180M would perform better than the 8170 value listed

Same number of cores/threads as 8170, but higher clock speed2.5 GHz vs 2.1 GHz

Mid-range model HS06 performance also similar~700 HS06 - ~1100 HS06

TDP somewhat higher for EPYC CPUs vs Xeon Gold, in general165 W max Xeon Gold, vs 180 W max EPYCCan likely expect EPYC to use a bit more power as a result

Page 13: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

13

EPYC vs Skylake-SP: CERN Cloud Benchmarks

DB12 (aggregate) Whetstone (aggregate) ATLAS KV (aggregate)0

200

400

600

800

1000

1200

1400

220

114

15

998

262

65

733

210

67

1256

361

120

Xeon Gold [email protected] GHz [40 threads] +Xeon Gold [email protected] GHz [72 threads]EPYC [email protected] GHz [64 threads]EPYC [email protected] GHz [128 threads]

Xeon

Gold 61 5

0

+ = System using only 3 memory channels per CPU

Xe

on Gold

5115

EP

YC

7351

EP

YC

7551

Xeon G

old 51 15

Xeon

Gold 61 50

EP

YC

7351

EP

YC

7551

Xeon G

old 5115

Xeon G

old 61 50

EP

YC

7351

EP

YC

7551

Dirac HS06 Est.

Events/Sec

BWIPS

Page 14: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

14

Results for a limited number of CPUs: only possible to run the full suite(including KV) on systems with CVMFS client installed/setup

By default the CERN cloud benchmarks run one instance of a benchmarkper logical core in parallel

However, only reports performance per logical coreInterested in aggregate system performance, not performance/logical core

For DB12, and Whetstone, simply multiplied result by number of logicalcoresFor KV, average seconds/event per logical core is reported

Took the inversion, and multiplied by the number of logical coresto obtain total events/sec

Larger graphed values are better

DB12 and Whestone results fairly in line with HS06

Expected somewhat better KV performance for the Xeon Gold 6150

EPYC/Skylake: CERN Cloud Benchmarks (Cont.)

Page 15: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

15

EPYC vs Skylake-SP: CPU HS06/Dollar

CPU0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.32

0.19

0.160.17

0.15

0.09

0.45

0.35

0.23

0.17

0.15

Xeon Gold [email protected] GHz [40 threads] +

Xeon Gold [email protected] GHz [64 threads]Xeon Gold [email protected] GHz [48 threads]Xeon Gold [email protected] GHz [80 threads]Xeon Gold [email protected] GHz [72 threads]Xeon Platinum [email protected] GHz [104 threads]EPYC [email protected] GHz [Uniprocessor - 48 threads]EPYC [email protected] GHz [64 threads]EPYC [email protected] Ghz [96 threads] DDR-2400EPYC [email protected] GHz [128 threads]EPYC [email protected] GHz [128 threads] DDR4-2400

Xe

on Gold 61

30

Xe

on Gold 61

36

Xeon

Gold 61 4

8

EP

YC

7401P

Xe

on P

latinum

8170

EP

YC

7351

EP

YC

7451 EP

YC

7551

EP

YC

7601

Xeon G

old 61 50

HS06/$

Only retail CPU cost accounted for in calculationsDoes not represent reality given memory and base server pricing

+ = System using only 3 memory channels per CPU

Xeon

Gold 51 1

5

Page 16: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

16

EPYC vs Skylake-SP: Estimated 25kHS06 Cost

CPU0

50

100

150

200

250

300

350

400

450

332

256 258

233

257

417

256

189

221

256 251

Xeon Gold [email protected] GHz [6*16 GB DIMMs, 64 servers] +

Xeon Gold [email protected] GHz [12*16 GB DIMMS, 34 servers]

Xeon Gold [email protected] GHz [12*8 GB DIMMs, 32 servers]

Xeon Gold [email protected] GHz [12*16 GB DIMMs, 23 servers]

Xeon Gold [email protected] GHz [12*16 GB DIMMs, 24 servers]

Xeon Platinum [email protected] GHz [12*32 GB DIMMS, 20 servers]

EPYC [email protected] GHz [8*16 GB DIMMs, 51 servers]

EPYC [email protected] GHz [16*8 GB DIMMs, 32 servers]

EPYC [email protected] GHz [16*16GB DIMMs, 23 servers]

EPYC [email protected] GHz [16*16 GB, 22 servers]]

EPYC [email protected] GHz [16*16 GB, 19 servers]

Xeon

Gold 61 3

0

Xe

on Gold 61

36

Xeon

Gold 61 4

8

EP

YC

7401P

Xe

on Platinu

m 81

70

EP

YC

7351

EP

YC

7451

EP

YC

7551

EP

YC

7601

Xe

on Gold 61

50

$1k

Estimated total cost of 25kHS06 +-500HS06Assuming $1,500 irreducible server cost, and retail CPU/memory pricing

Xe

on Gold 51

15

+ = System using only 3 memory channels per CPU

Page 17: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

17

Server counts to achieve 25kHS06 +-500HS06 (+-2%)

Majority of compute node cost in CPU and memoryAssuming no excessive local storage space or IOPs requirements

Typically the case for HEP/NP

Retail CPU costs used in estimateLikely to receive volume or competitive discounts

Estimate assumes a server without CPUs and memory costs $1,500Includes power supply, disk, NIC, etc.Only accounts for cost of servers themselves. Associated costs suchas racks, network switches, integration, shipping, etc. not includedServer vendors typically increase base prices for servers which supporthigher performing CPU models with higher TDP (i.e. due to bigger PSUs, etc.)

$1,500 server base cost may be lower than reality for systems with higher end CPUs

Memory costs: retail Samsung server DIMM pricingDDR4 2666 MHz, ECC, registered

8 GB DIMM - $13716 GB DIMM - $20832 GB DIMM - $380

EPYC vs Skylake-SP: Est. 25kHS06 Cost (Cont.)

Page 18: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

18

Enough memory populated per system to provide 2 GB/logical coreFairly standard for HEP/NP

Ensured all 6 (Skylake) or 8 (EPYC) memory channels per CPU utilized formaximum bandwidth, and NUMA performance

Often ended up with more RAM than required to satisfy 2 GB/logical core6 channels makes this particularly difficult for Skylake: installed memory not a power of two

Problem compounded by server manufacturers not offering smaller(i.e. 4 GB), less expensive DDR4 DIMMs

Estimated cost for 25kHS06 fairly similar between Skylake-SP Xeon Gold and EPYC servers: most close to $250k

Dual-CPU EPYC 7351 systems appear very cost effective, however: est. $189k~25% less than the Xeon Gold 6148 Required memory/logical core and DIMM channel parityCPU itself is inexpensive compared to other Skylake/EPYC counterparts

EPYC 7401P uniprocessor system HS06/$ for CPU cost initially looked promising

Large number of servers required: irreducible per server cost added upEstimated in the $250k range like many of the other CPUs

EPYC vs Skylake-SP: Est. 25kHS06 Cost (Cont.)

Page 19: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

19

Side-Channel Attacks

Jan 2018 - New class of side-channel information disclosure vulnerabilities in CPU hardware made public

Meltdown, SpectreExploit speculative execution and caching optimizations in CPUs

List of similar side-channel attack vectors continues togrow

Speculative Store Bypass VulnerabilityForeshadow (L1TF)

Meltdown Spectre

Microcode updates for Skylake-SP released for all of the above

AMD claims EPYC is not vulnerable to Meltdown or ForeshadowDue to existing protections in their paging architectureReleased microcode updates for Spectre

Page 20: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

20

The Future: EPYC 2 and Cascade LakeEPYC 2 - Rome

Expected in early 20197 nm processSupport for DDR4-3200 MHz DIMMs expected

Still 8 channelsMax core count per socket increased to 64 (128 threads)AVX512?

Cascade Lake XeonExpected end of 201814 nm process

10 nm process expected in Ice Lake in 2020Max memory speed: DDR4-2600 DIMMs

Still 6 channelsMax core count per socket remains at 28 (56 threads)Expected to support VNNI instructions

Utilizes AVX512 unitsSupport for Optane 3D Xpoint memoryAnnounced that the Frontera supercomputer at TACC will be Cascade Lake based – estimated to provide 35-40 PFLOPS without GPUs

Both CPUs will have Spectre mitigations built into hardware

Page 21: Evaluation of AMD EPYC · Evaluation of AMD EPYC Chris Hollowell  HEPiX Fall 2018, PIC Spain. 2 ... Intel’s Skylake-SP Xeon x86_64 server CPU line also released

21

Conclusions

The EPYC MCM architecture is considerably different from Skylake-SP’s single die configuration

Similar HEP/NP benchmark performance from mid/upper range Skylake-SP Xeon Gold and mid/upper range AMD EPYC CPUs

Pricing also similarHowever, dual-CPU EPYC 7351-based systems appear to be a sweet spot for applications requiring 2GB/logical core (somewhattypical for HEP/NP software)

Competition in the server CPU market will likely help reduce cost and spur innovation

EPYC2 (Rome) with its 7nm process and 64 physical cores appears poised to disrupt the existing balance of server CPU market share