29
May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA Hari Sivaraman, Staff Engineer @ VMware

May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

May 8-11 2017 | Silicon Valley

EVALUATING WINDOWS 10LEARN WHY YOUR USERS NEED GPU ACCELERATION

Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA

Hari Sivaraman, Staff Engineer @ VMware

Page 2: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

2

AGENDA

• Introduction

• Latest Announcements

• Windows 10 vs. Windows 7

• Performance Testing

• Summary

Page 3: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

3

TESLA LINEUP FOR GRIDThe most powerful data center GPUs targeted at graphics virtualization

M10 M6 M60

GPU Quad Mid-level Maxwell Single High-end Maxwell Dual High-end Maxwell

CUDA Cores 2560 (640 per GPU) 1536 4096 (2048 per GPU)

Memory Size 32 GB GDDR5 (8 GB per GPU) 8 GB GDDR5 16 GB GDDR5 (8GB per GPU)

H.264 1080p30 streams 28 18 36

Max vGPU instances 64 16 32

Form Factor PCIe 3.0 Dual Slot (rack servers) MXM (blade servers) PCIe 3.0 Dual Slot (rack servers)

Power 225W 100W (75W opt) 240W / 300W (225W opt)

Thermal passive bare board active / passive

USER DENSITYOptimized

BLADEOptimized

PERFORMANCEOptimized

Page 4: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

5

LATEST ANNOUNCEMENTS

Page 5: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

6

LATEST ANNOUNCEMENTS

• Instant Clone Support (VMware Horizon 7.1)

• Allows ultra fast provisioning of virtual machines.

• NVIDIA is the only GPU vendor supported

• High Availability Support(VMware vSphere 6.5)

• vSphere 6.5 supports HA for NVIDIA GRID vGPU enabled virtual machines

• Multi Monitor support with Blast Extreme H.264 HW (VMware Horizon 7.1)

• Offload the H.264 encode to the NVIDIA GPU for improved and predictable UX

S7763 - DELIVER A TRANSFORMATIVE 3D GRAPHICS USER EXPERIENCE WITH VMWARE HORIZON, BLAST EXTREME ADAPTIVE TRANSPORT, AND NVIDIA GRID

S7429 - EXPERT AND CUSTOMER ROUNDTABLE: REAL-WORLD TALES OF GPU-ACCELERATED DESKTOPS AND APPS - IMPLEMENTERS SHARE BEST PRACTICES

Page 6: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

7

WINDOWS 10

Page 7: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

8

WINDOWS 10 NEW CHANGES

• Visual compelling Modern UI / Menu with transparency

• No Modern UI Disabling, assumption is you have GPU on Windows 10

• GPU accelerated Virtual desktop / Task view / Alt-TAB preview

• Video playback GPU acceleration by default media player

• GPU accelerated font(DPI) and display scaling with Ultra high definition resolution

• Windows Device Driver Model WDDM 2.0 / DirectX 12 supported

• Microsoft Edge GPU acceleration

Page 8: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

9

WINDOWS 10 REQUIRES MORE RESOURCES FOR IMPROVEMENT USER EXPERIENCE

Windows 10 requires more GPU frame bufferWindows 10 requires more CPU cycles

0

100

200

300

400

Windows 7(single

1920x1080)

Windows 10(single

1920x1080)

Windows 10(single

2560x1600)

Windows 10(dual

1920x1080)

0

10

20

30

40

50

60

70

80

90

100

CPU

host

uti

liza

tion %

Time

Windows 7 Windows 10

64 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload

15% more CPU utilization

Page 9: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

10

WINDOWS START BUTTON EXPERIENCEThis is Side-by-Side

Page 10: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

11

PERFORMANCE TESTING

Page 11: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

12

• Two identical servers run LoginVSI Knowledge Worker to create a realistic customer environment

• CPU Utilization of the hosts is around 60-80%

• Testers don’t know which session is GPU accelerated

• Testers do the same tasks on both systems

• Access Devices (Thin Client/Monitor/Mouse/Keyboard) are the same with a single screen and 1080p resolution

• Predefined scenarios plus freestyle at the end.

• Scenarios include (Browsing, YouTube, Creation of PowerPoint, Google Maps, WebGL)

TEST SETUP - SUBJECTIVE USER TESTING

Page 12: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

13

0.0

1.0

2.0

3.0

4.0

5.0

Horizon 7 with PCoIP - No GPU Horizon 7 with Blast Extreme and H.264 HW

CPU ONLY VS. NVIDIA GRIDGPU with NVENC provide an average positive increase to UX of 34%

Higher is

better

Testing ran on two identical systems, CPU system was loaded up to 60-80% utilization, the GPU system ran the same workload

User Experience Scale

1 Unacceptable, unusable -

fire someone in IT!

2 Barely useable, borderline,

but I’ll get tired of this

soon

3 Tolerable, I guess I can

make do

4 Pretty good for a virtual

desktop

5 Outstanding - as good (or

almost) as physical

+20% +5%+19% +65%

+6% +21%+55% +26%

+9%+13%+13% +30% +68%+133%

Page 13: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

15

CLICK TO PHOTONWhat it is and why it matters

• Click-to-Photon is more than network latency

• Click-to-Photon is a key metric that contributes to the overall user experience

• Click-to-Photon defines how interactive/snappy the solution is

• Click-to-Photon measures the overall latency from the user perspective

• Click-to-Photon measures the time of the mouse click till the action is visible to the user

• includes latency of the USB device process, rendering the frame, displaying the frame, etc.

• Click-to-Photon in remote environments (VDI, etc.) in addition includes

• encode latency, network latency and decode latency

Page 14: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

16

CLICK TO PHOTON SIMPLIFIED

Mouse button

releasedMouse click

processed

Packetized and

encoded

Packet Received Packed Decoded

Frame displayedPacket

transmitted

Network Latency on the WAN

(i.e. 50ms)

CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY

Network Latency on the WAN

(i.e. 50ms)

Access Device

ServerPacket Received

Mouse click

processed

New Frame

renderedFrame Captured

via NVIDIA NVFBC

Frame Encoded

via NVIDIA NVENC

Frame

transmitted

Packet Decoded Application

Page 15: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

17

CLICK TO PHOTON SIMPLIFIED

Mouse button

released

Mouse click

processed

Packetized and

encoded

Packet Received Packed Decoded

Frame displayedPacket

transmitted

Network Latency on the WAN

(i.e. 50ms)

CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY

Network Latency on the WAN

(i.e. 50ms)

Access Device

Server

CLICK-TO-PHOTON LATENCY

Packet ReceivedMouse click

processed

New Frame

renderedFrame Captured

via NVIDIA NVFBC

Frame Encoded

via NVIDIA NVENC

Frame

transmitted

Packet Decoded Application

Page 16: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

18

CLICK TO PHOTON SIMPLIFIED

Mouse button

released

Mouse click

processed

Packetized and

encoded

Packet Received Packed Decoded

Frame displayedPacket

transmitted

Network Latency on the WAN

(i.e. 50ms)

CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY

Network Latency on the WAN

(i.e. 50ms)

Access Device

Server

CLICK-TO-PHOTON LATENCY

Network Latency

Packet ReceivedMouse click

processed

New Frame

renderedFrame Captured

via NVIDIA NVFBC

Frame Encoded

via NVIDIA NVENC

Frame

transmitted

Packet Decoded Application

Page 17: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

19

65

185

155 165

125107

0

50

100

150

200

250

300

Local PCwith

IntegratedGPU

BlastExtremeNo GPU -

JPEG/PNG

BlastExtremeM10-1B -

JPEG/PNG

BlastExtreme No GPU -

H.264Software

BlastExtremeM10-1B -

H.264Software

BlastExtremeM10-1B -

H.264Hardware

CLICK TO PHOTON LATENCYBlast Extreme with NVENC decreases latency up to 140ms

at <1ms network latency

Lower is

better

ms

Page 18: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

20

65

185

155 165125

107

250

170

240

160

110

0

50

100

150

200

250

300

Local PCwith

IntegratedGPU

BlastExtremeNo GPU -

JPEG/PNG

BlastExtremeM10-1B -

JPEG/PNG

BlastExtremeNo GPU -

H.264Software

BlastExtremeM10-1B -

H.264Software

BlastExtremeM10-1B -

H.264Hardware

Idle, 1 VM

Scale, 64VMs

Lower is

better

63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency

CLICK TO PHOTON LATENCYComparing latency of single VM and at scale at <1ms network latency

ms

Page 19: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

21

HOST CPU OFFLOADINGBlast Extreme decreases CPU utilization on the host, up to 42%

Lower is

better

63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency

0

10

20

30

40

50

60

70

80

90

100

NOGPU-PCoIP GPU-PCoIP

NoGPU-JPEG GPU-JPEG

NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU

GPU-BLAST-NVENC

0

15000

30000

45000

60000

75000

90000

Page 20: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

22

0

10

20

30

40

50

60

70

80

90

Perc

ent

One C

PU

core

Tim

e

Time

Remoting process utilization(PCoIP_server.exe or BlastW.exe) in

Guest VM

NOGPU-PCoIP GPU-PCoIP

NoGPU-JPEG GPU-JPEG

NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU

GPU-BLAST-NVENC

GUEST VM, REMOTING PROCESS CPU OFFLOADINGBlast Extreme decreases CPU utilization on the VM

Lower is better

63 x Tesla M10-0B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency

Page 21: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

23

VIDEO PLAYBACKUp to 52% improved User Experience due to GRID vGPU and H.264

FPS is remoted FPS

Page 22: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

24

VIDEO PLAYBACK

10

15

20

25

0 10 20 30 40

FPS

#VM

Average FPS for a set of Videos

JPG+vGPU

HW-H264 +vGPU

JPG-NOvGPU

SW-H264

5

105

205

305

405

505

605

705

805

0 10 20 30 40

FPS

#VM

Total FPS for a set of Videos

JPG +vGPU

HW-H264+ vGPU

JPG-NOvGPU

SW-H264

Page 23: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

25

VIDEO PLAYBACK

0

5

10

15

20

25

0 5 10 15 20 25 30 35

CP

U-U

til (

%)

#VM

CPU-Util (%) for a set of Videos

JPG +vGPU

HW-H264+vGPU

JPG-NOvGPU

SW-H264

Page 24: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

26

VIDEOS

Page 25: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

27

POWERPOINT ANIMATIONThis is Side-by-Side

Page 26: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

28

VIDEO PLAYBACK AND OFF LOADING CPUThis is Side-by-Side

Page 27: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

29

SUMMARY

Page 28: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

30

WINDOWS 10 IS DIFFERENT Windows 10 is Microsoft’s most graphical operating system

• Windows is differs to Windows 10

• requires more CPU resources

• Leveraged the GPU more

• NVIDIA GRID vGPU

• Improves user experience (as Microsoft intended)

• Reduces Click-to-Photon latency(snappy user interaction)

• Predictable and consistent user experience

• reduces CPU cycles to allow higher user density

6/9/2017

Page 29: May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN … · 2017. 6. 9. · Windows 10 requires more CPU cycles Windows 10 requires more GPU frame buffer 0 100 200 300 400 Windows

May 8-11 2017 | Silicon Valley

THANK YOU