NVIDIA’s Experience with Open64 Mike Murphy NVIDIA

NVIDIA’s Experience with Open64

Mike Murphy

NVIDIA

© NVIDIA Corporation 2008

Outline

Why Open64

How we use Open64

What we did to Open64

Future work in Open64


Compiling CUDA for GPUs

NVCC

C/C++ CUDAApplication

GPU Code CPU CodeGPU Code

executable


Why Open64

We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes.

own gcc open64


Why Open64


own gcc open64

take too long


Why Open64


own gcc open64

take too long good long-term support


Why Open64


own gcc open64

take too long good long-term support

best performance

(kudos to PathScale)


NVCC processing of GPU code

cudafe

C code for GPU

nvopencc (Open64)

ptx

OCG

object code


Changes: Rehosting Open64

Our compiler has to run on 32 & 64bit Linux, 32 & 64bit Windows, and Mac OS.

Main Open64 source tree is only for Linux.This is an area where sharing our changes can help grow the user base by making it easier to port Open64.

For Windows we build using Cygwin’s MINGW


Changes: Memory and registers

We don’t have a stack or fast memory

Therefore want to keep data in registers

Inline everything and optimize as much as possible

Try to keep small structs in registers by expanding struct copies into field copies (versus taking address and generating loop to do byte copy)


Changes: Vector loads and stores

Coalesce adjacent loads and stores for performance

Do this in CG:Iterate through ops, trying to add to vectors

Check for intervening kills

Change alignment and use dummy regs for padding if helps to create wider vector (e.g. may use 4-word vector for 3-word struct).


Changes: 16bit optimization

Cheaper to use 16bit registers and operations

But C converts shorts to int.

So add pass in CG that converts back to 16bit:Mark 16bit loads, stores, and converts

Propagate 16bit-ness forwards and backwards

Unmark 16bit-ness if cannot be 16bit

Change remaining registers and instructions to be 16bit.


Future work

1 person -> 4 people working with Open64

New application TBA

Merging changes into trunkThanks to Sun Chan and Shin!

Investigating register pressure in WOPTWant better control of register pressure during optimization

Investigating using other features (LNO, IPA, etc)


Questions?

http://www.nvidia.com/CUDA

[email protected]

Documents

NVIDIA’s Experience with Open64 Mike Murphy NVIDIA