Al Gara Interview from The Exascale Report

© 2013, The Exascale Report ™

Pag

e1

Renowned HPC System Architect Alan Gara Talks About Exascale Pathfinding

Confirms Intel’s Vision for Co-design and Fabric Integration

By Mike Bernhardt June 2013 Among a very elite group of HPC experts, Alan

Gara is widely recognized and respected as one

of the HPC community’s true visionaries and

authentic leaders. He not only has a good grasp

on what it will take to build a system capable of

exaflops, he also directs a world-class technical

team currently researching and driving every

aspect of exascale development. And it doesn’t

hurt that he happens to be doing this under the

Intel umbrella where HPC has become somewhat

of a keystone initiative.

Gara is of course best known for his role

architecting the IBM BlueGene architecture. He

joined IBM in 1999, was named an IBM Fellow in

2006, and left to join Intel in June 2011. He went

from leading IBM's exascale system research

efforts to his current position as Intel’s Chief

Exascale Architect in a group Intel refers to as

“pathfinding” – an area that falls between

research and products.

At Intel, he is once again focused on critical high-

end scaling issues such as low power operation,

innovative cooling, resiliency, emerging memory

technology, the next generation of interconnect

technology – everything that will need to come

together to form the future architecture of

exascale.

The Exascale Report is pleased to bring you this feature interview with Intel’s Chief Exascale Architect, Alan Gara. The Exascale Report (TER): I’m curious about

the title of “pathfinding.” Does this hold a

special distinction within Intel? How is

pathfinding different from research?

GARA: I think it is fairly unique to Intel. I’m not

certain of that, but I’m not aware of it being used

in another organization this way. At Intel, it has

a well-defined meaning. Pathfinding is very

much what it sounds like. It is an early stage in

our product development process where we

define the high level direction that we will be

taking the product. As it is part of the product

development process it has a well defined

schedule and set of completion criteria. It’s a

stage of product development where we set the

high level direction for the products. Research,

which comes before pathfinding, is more

unconstrained in terms of the timescale as well

as the areas that can be explored. In research,

we take some high risks and we don’t always

anticipate that all research projects will turn into

products. In pathfinding ,we exit with a clear

direction and necessary elements of product

direction.

TER: At the recent symposium held at Argonne

to celebrate thirty years of parallel computing,

you gave a presentation titled, “Application

Challenges Resulting From Technologies of the

Future” I find this title quite intriguing. It seems

that, in order to understand the Application

Challenges of the next 7-10 years, we need to

have a handle on what the new technologies

might be that developers will have to work with.

Yet it seems like almost every aspect – every

http://symposium30.cels.anl.gov/agenda/


Pag

e2

technical details – related to the technologies of

the future, particularly with exascale are up in

the air right now. Have you determined some

approaches that application developers could

actually be using today to insure they have code

that will be scalable on exaflops machines?

GARA: It is true that we anticipate that

technologies will play an important role in

defining systems of the future and

correspondingly the things users will need to do

to extract

performance. This

is not as new of a

direction as it might

sound. Users have

been adapting to

the realities

imposed by system

architectures for a

long time. The most

obvious example is

our inability to

continue to

increase frequency

has resulted in

users needing to

exploit much larger

degrees of concurrency.

There are a number of branch points in the

future that technology will drive. There will be

things that will dictate if we go one way or

another, and none of us can predict today which

way it will go. However, there are some things

that we do anticipate and we know will be there

as part of all possible directions for exascale. It’s

really more a question of degree. Some

technologies can sort of make the day – and

make the switch easier.

One piece of advice that I like to give to users is

that they should really be focusing on threading

their applications to try to enable them, from a

system architecture perspective, to exploit as

much performance as possible from a finite

amount of memory effectively – or a finite state

of their problem. And the reason that’s

important is that memory itself is such a big

swinger in the

whole picture.

Right now if you

look at current

systems and the

way they are

balanced, the

amount of silicon

dedicated to

memory is actually

quite high but it’s

also not scaling as

fast as we are

scaling

performance of

the compute. And

we see that issue

as getting harder and harder. It’s already skewed

quite a bit. There’s already much more silicon

involved in memory than there is in the

processor, even when you take into account the

difference in the cost of wafers a, it’s still skewed

considerably. Therefore, as we go forward, we

can’t just assume that the memory scales at the

same rate as compute performance. Or, if we

really do get one of those revolutionary new

memory technologies that come in and have the

right bandwidth characteristics, resiliency

Intel Chief Exascale Architect, Alan Gara


Pag

e3

characteristics, etc., then maybe we don’t have

to push quite as hard on that dimension. But in

any case, you will have to thread, it’s just a

degree of how much and while new memory

technologies can really help alleviate some of

this, it won’t completely eliminate it.

One interesting thing

about the

technologies of the

future is that while in

the past we have

often grappled with

what we could no

longer do, in the future we see technology areas

which open up the possibility of being able to do

things that we could not do before. An example

of this is in the area of memory technologies

which will potentially allow us to turn the clock

back a bit on the tradeoffs between bandwidth

and capacity. We currently can’t have both

which results in the layered cache hierarchy.

Some of the new memory technologies allow us

to change this constraint somewhat. I am not

suggesting that we will be able to eliminate

caches entirely but some of these technologies

do have the potential of simplifying it somewhat.

TER: Considering all the technical breakthroughs

we need in order to reach exascale, how

important is a new memory technology to

achieving this goal?

GARA: Achieving an Exascale will be an amazing

accomplishment which is likely to initially be

focused on solving important highly scalable

problems. The biggest challenge to reaching

Exascale is to do this in a manner that enables

accessible performance, reasonable total system

power, high reliability, reasonable cost. And… to

achieve this in a reasonable timescale. We know

how to do each of these in isolation but doing all

simultaneously represents the real challenge.

Memory comes into the Exascale challenge in a

number of ways. The most important dimension

is energy efficiency. This

is more a memory

microarchitecture

innovation as opposed

to a fundamentally new

physical device. For us

to achieve Exascale we

need to dramatically reduce the energy that is

needed to access memory. Of course there is

also the possibility that new device technology

could also help energy efficiency. Right now

though…. most of the energy associated with

memory is not attributable to the actual physical

memory cell.

New memory technologies are extremely

important for the future. We know that the

scaling of the physical DRAM device is getting

much more difficult going forward. We have

been struggling for some time with memory

density improving at a much slower rate than we

are able to increase compute performance. This

has put extreme stress on users and without new

memory technologies this skewing will continue.

We already have much more silicon area in the

memory than we have for the compute. We

either find a new memory technology that eases

this pressure by allowing for significantly higher

densities or users will feel the pinch even more

as we go forward. We want to build machines

that are at least as usable as current machines so

We need to transition our thinking from

energy efficiency at the transistor level –

to energy efficiency at the system level.


Pag

e4

moving these memory technologies forward is

really critical.

TER: Are there any emerging memory

technologies that you find particularly

promising?

GARA: There are many that show incredible

promise and I would find it hard to bet on any

one horse right now. We expect to see a lot of

experimentation at the system level of these

new technologies. They each have their

strengths – their own attributes in terms of

performance, resiliency, power, etc., but key

drivers are capacity per dollar of memory and

bandwidth per dollar - power is really the

fundamental challenge we face in getting to

exascale. It’s very high on our list of focus areas.

There are also new

memory

technologies that

are able to be

integrated directly

on the compute

die, and there are

some that are not.

We could very well

find that some of the options that could be

integrated are not really optimal in terms of

performance – or capacity per dollar, but

because they can be integrated, they bring a

different value, so we may want some of these

and some other new memory technologies to

deal with the capacity problem we are facing.

In other words, there may be multiple new

technologies that emerge with each finding their

place within system architecture. Some may win

in the high capacity , best $/bit area such as is

needed in far memory and burst buffers for file

systems while others may emerge as viable high

density solutions that can be integrated into the

same die as the processor core.

TER: How about Near Threshold Voltage

research? Is this part of your research domain –

and is it yielding promising results for exascale?

GARA: Near Threshold Voltage really offers an

opportunity to get significant improvements in

energy efficiency at the transistor level. So yes,

it plays a very important role and we’re looking

at near threshold carefully. But as in all things,

there’s no free lunch here. Near Threshold

Voltage comes at a pretty significant decrease in

the performance of those devices. The amount

of silicon area you get per device and the

performance of

that device both

go down. The

reality is - what we

really need is

energy efficiency

at the system

level.

And since energy

efficiency is probably our biggest challenge this is

a very important part of our research. In

assessing these technologies we need to take a

broad system view. It is not just the question of

how efficient is a single transistor but really how

efficient is a system for real applications that is

built out of such transistors.

In other words, we need to transition our

thinking from energy efficiency at the transistor

level – to energy efficiency at the system level.

Accelerators like GPUs have been fairly

difficult for the community to use. They have

been explored in HPC for more than a decade

and there remain very few production codes

which have shown better performance.


Pag

e5

When we explore the question of ‘Does Near

Threshold Voltage show promising results for

exascale’, getting to an answer is much more

complex than a simple yes or no as it makes

assumptions as to what user applications will

look like in 5 to 10 years. We know that if our

only requirement was to build a system that

could achieve 1 EF/s for a simple code that we

would be able to do this by the end of the

decade. But we would not want to build a

machine that is not highly usable so the degree

that we push in directions like near threshold

voltage is tempered by this. The long term

answer to this will be that we can operate in

many different domains, we will be able to

operate at very low voltages when the

application can exploit extreme levels of

parallelism and we will be able to also operate in

a mode which is optimal for algorithms that have

far less parallelism. As an architect my job is to

make this as accessible as possible to the user

and where viable, hide this complexity.

When we look at this from a system perspective,

we have to look at many more things – at the

device level – it’s the algorithms. There isn’t any

one answer. There can be multiple answers

depending on your algorithm. Maybe you have

an enormous amount of concurrency and the

frequency doesn’t matter. What really matters

to you is you just want to run nearly an infinite

number of threads – and in that case, Near

Threshold Voltage could be exactly what you are

looking for. On the other hand, there are parts of

algorithms that we typically see that don’t have

that behavior where at least for some time

period, the performance of a single thread is a

limiter. As a result, I think the real answer here is

that we need to make sure we can provide

devices and architectures that will allow us to do

both and use the right one at the right time and

so we are working on techniques to be able to

do that both within a single device and core as

well as in more heterogeneous types of

architectures. And there are again plusses and

minuses in those two approaches. But I think we

need to maintain that level of flexibility in the

architecture because trying to assume that we

can drop the frequency by 10x and still continue

to scale performance would be naïve. While

things are moving quickly in the right direction,

in that perspective, it’s going to take a long time

before frequency doesn’t matter to the majority

of applications.

TER: One industry luminary was recently quoted as saying, “All HPC systems in the future will have GPUs.” Would you agree with this comment? GARA: The industry has a long history of

absorbing things that were one day considered

accelerators into part of the baseline

architecture. One example is the floating point

units. These used to be add-on accelerator

devices much like GPUs are today. So in that

context I would not be surprised to see some

aspects of what we currently think of as GPUs as

baseline features. On the other hand,

accelerators like GPUs have been fairly difficult

for the community to use. They have been

explored in HPC for more than a decade and

there remain very few production codes which

have shown better performance. Some of this is

due to them not being integrated more closely

into the processor. You can see the trend that

GPUs are going to be integrated more tightly

with a processor. I don’t expect that most


Pag

e6

systems will be built with add on cards similar to

how GPUs are configured today; this direction

would likely continue to have power and

performance challenges. Valued

features/concepts will be integrated into a CPU

where it makes sense. As we have more

transistors available in the future, integrating

accelerators is a viable direction that we are

exploring but they need to have enough of an

application reach to justify the silicon area.

If you look at the GPU

roadmap, I think what

you’ll see is GPUs

morphing to CPUs in a

lot of the things they

are doing in trying to

deal with the fact that

they are just too far away from a general

purpose processor.

TER: Raj Hazra recently talked about fabric

integration and the critical importance of Co-

design, and not just in a vertical path, but also as

sideways – or horizontal co-design as being key

to achieving exascale. Is your group responsible

for co-design strategies? What can you tell us

about progress in this area?

GARA: Fabric integration is one of the natural

next steps we need to go to. We need to get to it

to deal with the latencies and performance

levels and cost levels that are necessary if we

want to stay on this exponentially proving curve

of performance. I think we at Intel have made a

great deal of progress in that area – including

some of the recent acquisitions, so we certainly

take this very seriously and I anticipate it to be

part of our roadmap in the future.

Co-design is fundamental to us being able to

build systems that are usable, cost effective and

power efficient. All the systems development

efforts within Intel are very focused on this. We

have made a lot of progress in this area and we

can see it bearing fruit in our products. With the

inevitable technology disruptions that will be

adopted in HPC there is really no other way to

effectively proceed and have any hope that what

we are building makes sense. One big advance

has been engagements

with the government

agencies where we are

now involved in a number

of programs – programs

that provide critical

feedback from the

application experts on

possible architectural directions long before the

technology enters into pathfinding. This is

extremely important to us as we need to make

technology and architecture choices at least 5

years before products are generally available

hence we are designing for application 5-10

years out

TER: A number of spokespeople at Intel feel

confident we will achieve exascale-level

computation by the end of the decade. I assume

you agree with this position, but what area of

technology innovation do you see as a possible

deal breaker?

GARA: Getting to exascale level computing by

the end of the decade is certainly possible but

there are a number of challenges that we still

need to overcome. As I mentioned the biggest of

these is probably energy efficiency. If one

If the US does not aggressively invest

in HPC, the country could find itself

in a very tough position.


Pag

e7

removes the energy efficiency constraint this

becomes much easier to achieve.

There will need to be many innovations to pull

this off. New memory microarchitectures are

absolutely critical. Similarly we will be depending

on continued scaling of our silicon technology.

The last I would mention is silicon photonics.

Supercomputers will be pushing the

communications requirements. Without silicon

photonics the cost of a highly usable system

would likely be prohibitive.

TER: As Intel’s Chief Exascale Architect, what is

your perception of U.S. Technology Leadership

today – and do you think the U.S. has any chance

of being the first nation to field an exascale-class

system?

GARA: It is really the U.S.’s to lose in some

sense. There is no nation that is better

positioned to achieve this. On the other hand we

are seeing enormous investments being made

into this area in many countries. We are in an era

where HPC is really blossoming in terms of

adoption. It is recognized by many countries as

critical to their national competitiveness. A lot

depends on how quickly the US government

responds and how they get behind this. If the US

does not aggressively invest in HPC, the country

could find itself in a very tough position and it

will be much harder to come back to a

technology and computing leadership position.

TER: As a community, what are we doing wrong,

or what could we be doing differently as it relates

to exascale research?

GARA: I think the U.S. emerging exascale

community is like a family – everyone is in this

together. I would not really call out any area

where we are clearly doing something wrong.

There are of course areas where this community

could be doing better, there always will be, but

the emerging Exascale community is a very close

community thriving on very strong and

widespread collaboration. One area where the

HPC community sometimes struggles is with the

definition of “goodness”. The idea of ranking

supercomputers has had a dramatic impact on

helping the community to focus on a concrete

goal. But the time has come to change the way

we measure these systems or we are at risk of

pushing designs into a direction that does not

make sense for general applications. This is being

worked on and it is very important to keep the

community focused but we need a metric that

keeps us all moving in the right direction.


Alan Gara’s bio can be found at:

http://newsroom.intel.com/community/intel_ne

wsroom/bios?n=Alan Gara&f=searchAll