25

Click here to load reader

Inside XBOX ONE by Martin Fuller

Embed Size (px)

Citation preview

Page 1: Inside XBOX ONE by Martin Fuller

Inside Xbox OneMartin FullerXbox Advanced Technology Group

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM

Page 2: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 2

• This is a non-NDA event• That means there is a limit to how much I can say, go easy!

NDA

Page 3: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 3

AMD Jaguar (x64) - 8-cores arranged in 2x clusters of 4 cores each

1.75 GHz Dual issue Out of order execution Speculative execution Store-to-load forwarding SSE4.2 and AVX

(Dot product!) 16 x 256-bit wide floating point registers

Hardware pre-fetch

CPU

Page 4: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 4

8 GiB of DDR3 at 68 GiB/s Low latency Not enough bandwidth to touch all of memory a frame, RAM as a super fast cache

48-bit virtual address space 256 terabytes Tricky to fragment! Synced between CPU and GPU

4 MiB of L2 cache 2 MiB per cluster

MOESI protocol for cache coherency 16-way set associative Per core, up to eight cache requests in flight at once

Memory

Page 5: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 5

1. Store to load forwarding saves the dreaded LHS stall But not spilling out registers is even better

2. The branch predictor is not a crystal ball Branchless tricks learnt in Xbox 360 era can still apply

3. Hardware data pre-fetch is awesome Only works with arrays

4. Avoid aliasing load/stores on 2KiB alignments This causes a false positive that delays load execution

5. Go wide with SSE and leverage all cores No brainer

CPU – Recommendations

Page 6: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 6

AMD GCN 768-SPU • 853 MHz • 32 MiB of ESRAM at 102 GiB/s• 4 Move Engines• 3 hardware display planes

Resolution independent Frame rate independent

• Exact sRGB this time! (oh, and its free)

• Hardware video encode and decode• HDMI 1.4a in and out

GPU

Page 7: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 7

More than just DMA copy Memory set Texture swizzle JPEG decompress LZ compress and decompress

Move Engines

Page 8: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 8

32MiB of general purpose RAM Not like EDRAM on Xbox 360 102 GiB/s

Sometimes faster in practice! Zero contention

Not shared with CPU, SRA’s or video out ESRAM makes everything better

Render targets Textures Geometry Compute tasks

Use DRAM + ESRAM at the same time

ESRAM

Page 9: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 9

1. Statically allocate a small number of render targets in ESRAM2. Alias the same memory for re-use later3. Partial residency

Put the top strip of render targets (sky) in DRAM, the rest in ESRAM

4. Asynchronously DMA resources in/out of ESRAM

Launch titles were at 1 - 22nd wave of titles are now starting to tackle points 3 and/or 43rd+ wave will get really good at this!

ESRAM – The Four Stages of Adoption

Page 10: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 10

It’s like 8 bit days all over again!(Sort of)

Plan the asynchronous moves Move resources in/out asynchronous while also rendering New memory map at each stage of the render pipeline

Don’t forget, swizzle textures on DMA

ESRAM – Memory Maps!

Page 11: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 11

1. Are you bandwidth limited?

2. Have you maxed out the fixed function hardware?

3. Do you have spare compute resource?

Then use async compute!Titles have barely scratched the surface yet:Watch this space!

Maxing out the GPU

Page 12: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 12

1. Use ESRAM First for depth / stencil Then colour targets Then everything else

2. Sort by state / shader / use hardware instancing (Batch batch batch!)

3. Always swizzle textures4. Be wary of using too many general purpose registers

Keep an eye on occupancy in PIX, we normally recommend >= 4

5. Avoid reading DRAM via the CPU-coherent bus6. There is no hardware integer divide

The usual GPU recommendations

Page 13: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 13

DX11 was designed for the desktop (a long time ago, 2008!) Abstracts a variety of different GPU architectures Manages VRAM residency for you

Over subscribing VRAM is a serious performance pitfall Handles hazards

Developers can handle these at a higher level => less cost

Xbox One will run vanilla DX11 PC code Easy port Extensions available for low level access

Graphics API

Page 14: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 14

DX11.X Some DX12 features available right now on Xbox:

Turn off hazard tracking Simple fence API

Deferred contexts re-implemented New resource descriptor model Draw bundles

(Xbox specific, not the DX12 API)

Graphics API

Page 15: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 15

The CPU cannot saturate DRAM bandwidth on its own, the GPU can! Significant performance degradation from DRAM contention Fancy CPU features don’t help if memory starved

10. Use ESRAM as much as possible20. Leave DRAM for the CPU and DMA30. goto 10;

DRAM - Contention

Page 16: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 16

1. Hardware data cache pre-fetch units are awesome Manual pre-fetch is near pointless once hardware pre-fetch is spinning Wasting bandwidth if only operating on small arrays

2. Write combined memory pages and SSE streaming store instructions by-pass the cache

No load - halves the bandwidth consumed by the CPU

3. Pack your data! Expanding / compressing data is cheap (CPU & GPU)

F16C (half <-> float) CPU instructions Store to load forwarding avoids LHS stalls

4. Swizzle your textures Move engines can swizzle on copy

DRAM – Love your bandwidth

Page 17: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 17

Custom audio hardware Very fast Lots of features Kinda cool!

Nuff said

Audio

Page 18: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 18

1. ERA Exclusive Resource Allocation Only one active at a time Custom OS (Games!)

2. SRA Shared Resource Allocation Win8 core (Apps)

3. Hypervisor SRA and ERA use different virtual address space

3x Operating Systems

Page 19: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 19

ERA can be in one of several states1. Full screen

Full resources (even with snapped app up)

2. Constrained (Windowed) Slightly less CPU and GPU resource No input Same amount of memory

3. Suspended Zero CPU and GPU resource No input Same amount of memory

Limited time to save after receiving a suspend message

PLM (Program Lifetime Management)

Page 20: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 20

Hardware: Higher resolution colour and depth Better ranges New – infrared! Microphone array No tilt motor

Software: Improved skeletal tracking Improved biometrics

Kinect 2.0

Page 21: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 21

6x Bluray = ~26 MiB/s To install a 50 GiB Bluray at ~26 MiB/s = ~33 minutes

Too long to wait… bored now… Game must start after an initial payload has been installed. When running title can hint as to what to install next. No direct access to Bluray.

Could be digital download It’s obvious but I’ll say it anyway – compress you assets!

Streaming install

Page 22: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 22

Cloud compute:• Developer’s code is hosted and executed in Windows Azure• Game code execution automatically scales based upon usage

Live services:• Stats, analytics, matchmaking & storage.

Secure!

The Cloud

Page 23: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 23

1. Is your code 64-bit compliant?2. Can you scale to 6 cores?3. Adopt new DX11.X API extensions

Manage your own resource hazards

4. Make sure you use ESRAM effectively5. Package content for streaming install

Game design considerations

6. Quick save on ERA termination7. Kinect, Smartglass8. Cloud services

Challenges

Page 24: Inside XBOX ONE by Martin Fuller

AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 24

(That I’m allowed to answer)

Thank You! – Questions?

Page 25: Inside XBOX ONE by Martin Fuller

© 2014 Microsoft The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing

market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.