Upload
insidehpc
View
1.090
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Over at The Exascale Report, Mike Bernhardt has published this new interview with Alan Gara, Intel’s Chief Exascale Architect. Formerly chief system architect for the three generations of BlueGene supercomputers at IBM, Gara has been out of the limelight since he left Big Blue in 2011. In this interview, he describes Intel's vision for co-design and fabric integration. Subscribe to The Exascale Report: http://theexascalereport.com/user/register
Citation preview
© 2013, The Exascale Report ™
Pag
e1
Renowned HPC System Architect Alan Gara Talks About Exascale Pathfinding
Confirms Intel’s Vision for Co-design and Fabric Integration
By Mike Bernhardt June 2013 Among a very elite group of HPC experts, Alan
Gara is widely recognized and respected as one
of the HPC community’s true visionaries and
authentic leaders. He not only has a good grasp
on what it will take to build a system capable of
exaflops, he also directs a world-class technical
team currently researching and driving every
aspect of exascale development. And it doesn’t
hurt that he happens to be doing this under the
Intel umbrella where HPC has become somewhat
of a keystone initiative.
Gara is of course best known for his role
architecting the IBM BlueGene architecture. He
joined IBM in 1999, was named an IBM Fellow in
2006, and left to join Intel in June 2011. He went
from leading IBM's exascale system research
efforts to his current position as Intel’s Chief
Exascale Architect in a group Intel refers to as
“pathfinding” – an area that falls between
research and products.
At Intel, he is once again focused on critical high-
end scaling issues such as low power operation,
innovative cooling, resiliency, emerging memory
technology, the next generation of interconnect
technology – everything that will need to come
together to form the future architecture of
exascale.
The Exascale Report is pleased to bring you this feature interview with Intel’s Chief Exascale Architect, Alan Gara. The Exascale Report (TER): I’m curious about
the title of “pathfinding.” Does this hold a
special distinction within Intel? How is
pathfinding different from research?
GARA: I think it is fairly unique to Intel. I’m not
certain of that, but I’m not aware of it being used
in another organization this way. At Intel, it has
a well-defined meaning. Pathfinding is very
much what it sounds like. It is an early stage in
our product development process where we
define the high level direction that we will be
taking the product. As it is part of the product
development process it has a well defined
schedule and set of completion criteria. It’s a
stage of product development where we set the
high level direction for the products. Research,
which comes before pathfinding, is more
unconstrained in terms of the timescale as well
as the areas that can be explored. In research,
we take some high risks and we don’t always
anticipate that all research projects will turn into
products. In pathfinding ,we exit with a clear
direction and necessary elements of product
direction.
TER: At the recent symposium held at Argonne
to celebrate thirty years of parallel computing,
you gave a presentation titled, “Application
Challenges Resulting From Technologies of the
Future” I find this title quite intriguing. It seems
that, in order to understand the Application
Challenges of the next 7-10 years, we need to
have a handle on what the new technologies
might be that developers will have to work with.
Yet it seems like almost every aspect – every
© 2013, The Exascale Report ™
Pag
e2
technical details – related to the technologies of
the future, particularly with exascale are up in
the air right now. Have you determined some
approaches that application developers could
actually be using today to insure they have code
that will be scalable on exaflops machines?
GARA: It is true that we anticipate that
technologies will play an important role in
defining systems of the future and
correspondingly the things users will need to do
to extract
performance. This
is not as new of a
direction as it might
sound. Users have
been adapting to
the realities
imposed by system
architectures for a
long time. The most
obvious example is
our inability to
continue to
increase frequency
has resulted in
users needing to
exploit much larger
degrees of concurrency.
There are a number of branch points in the
future that technology will drive. There will be
things that will dictate if we go one way or
another, and none of us can predict today which
way it will go. However, there are some things
that we do anticipate and we know will be there
as part of all possible directions for exascale. It’s
really more a question of degree. Some
technologies can sort of make the day – and
make the switch easier.
One piece of advice that I like to give to users is
that they should really be focusing on threading
their applications to try to enable them, from a
system architecture perspective, to exploit as
much performance as possible from a finite
amount of memory effectively – or a finite state
of their problem. And the reason that’s
important is that memory itself is such a big
swinger in the
whole picture.
Right now if you
look at current
systems and the
way they are
balanced, the
amount of silicon
dedicated to
memory is actually
quite high but it’s
also not scaling as
fast as we are
scaling
performance of
the compute. And
we see that issue
as getting harder and harder. It’s already skewed
quite a bit. There’s already much more silicon
involved in memory than there is in the
processor, even when you take into account the
difference in the cost of wafers a, it’s still skewed
considerably. Therefore, as we go forward, we
can’t just assume that the memory scales at the
same rate as compute performance. Or, if we
really do get one of those revolutionary new
memory technologies that come in and have the
right bandwidth characteristics, resiliency
Intel Chief Exascale Architect, Alan Gara
© 2013, The Exascale Report ™
Pag
e3
characteristics, etc., then maybe we don’t have
to push quite as hard on that dimension. But in
any case, you will have to thread, it’s just a
degree of how much and while new memory
technologies can really help alleviate some of
this, it won’t completely eliminate it.
One interesting thing
about the
technologies of the
future is that while in
the past we have
often grappled with
what we could no
longer do, in the future we see technology areas
which open up the possibility of being able to do
things that we could not do before. An example
of this is in the area of memory technologies
which will potentially allow us to turn the clock
back a bit on the tradeoffs between bandwidth
and capacity. We currently can’t have both
which results in the layered cache hierarchy.
Some of the new memory technologies allow us
to change this constraint somewhat. I am not
suggesting that we will be able to eliminate
caches entirely but some of these technologies
do have the potential of simplifying it somewhat.
TER: Considering all the technical breakthroughs
we need in order to reach exascale, how
important is a new memory technology to
achieving this goal?
GARA: Achieving an Exascale will be an amazing
accomplishment which is likely to initially be
focused on solving important highly scalable
problems. The biggest challenge to reaching
Exascale is to do this in a manner that enables
accessible performance, reasonable total system
power, high reliability, reasonable cost. And… to
achieve this in a reasonable timescale. We know
how to do each of these in isolation but doing all
simultaneously represents the real challenge.
Memory comes into the Exascale challenge in a
number of ways. The most important dimension
is energy efficiency. This
is more a memory
microarchitecture
innovation as opposed
to a fundamentally new
physical device. For us
to achieve Exascale we
need to dramatically reduce the energy that is
needed to access memory. Of course there is
also the possibility that new device technology
could also help energy efficiency. Right now
though…. most of the energy associated with
memory is not attributable to the actual physical
memory cell.
New memory technologies are extremely
important for the future. We know that the
scaling of the physical DRAM device is getting
much more difficult going forward. We have
been struggling for some time with memory
density improving at a much slower rate than we
are able to increase compute performance. This
has put extreme stress on users and without new
memory technologies this skewing will continue.
We already have much more silicon area in the
memory than we have for the compute. We
either find a new memory technology that eases
this pressure by allowing for significantly higher
densities or users will feel the pinch even more
as we go forward. We want to build machines
that are at least as usable as current machines so
We need to transition our thinking from
energy efficiency at the transistor level –
to energy efficiency at the system level.
© 2013, The Exascale Report ™
Pag
e4
moving these memory technologies forward is
really critical.
TER: Are there any emerging memory
technologies that you find particularly
promising?
GARA: There are many that show incredible
promise and I would find it hard to bet on any
one horse right now. We expect to see a lot of
experimentation at the system level of these
new technologies. They each have their
strengths – their own attributes in terms of
performance, resiliency, power, etc., but key
drivers are capacity per dollar of memory and
bandwidth per dollar - power is really the
fundamental challenge we face in getting to
exascale. It’s very high on our list of focus areas.
There are also new
memory
technologies that
are able to be
integrated directly
on the compute
die, and there are
some that are not.
We could very well
find that some of the options that could be
integrated are not really optimal in terms of
performance – or capacity per dollar, but
because they can be integrated, they bring a
different value, so we may want some of these
and some other new memory technologies to
deal with the capacity problem we are facing.
In other words, there may be multiple new
technologies that emerge with each finding their
place within system architecture. Some may win
in the high capacity , best $/bit area such as is
needed in far memory and burst buffers for file
systems while others may emerge as viable high
density solutions that can be integrated into the
same die as the processor core.
TER: How about Near Threshold Voltage
research? Is this part of your research domain –
and is it yielding promising results for exascale?
GARA: Near Threshold Voltage really offers an
opportunity to get significant improvements in
energy efficiency at the transistor level. So yes,
it plays a very important role and we’re looking
at near threshold carefully. But as in all things,
there’s no free lunch here. Near Threshold
Voltage comes at a pretty significant decrease in
the performance of those devices. The amount
of silicon area you get per device and the
performance of
that device both
go down. The
reality is - what we
really need is
energy efficiency
at the system
level.
And since energy
efficiency is probably our biggest challenge this is
a very important part of our research. In
assessing these technologies we need to take a
broad system view. It is not just the question of
how efficient is a single transistor but really how
efficient is a system for real applications that is
built out of such transistors.
In other words, we need to transition our
thinking from energy efficiency at the transistor
level – to energy efficiency at the system level.
Accelerators like GPUs have been fairly
difficult for the community to use. They have
been explored in HPC for more than a decade
and there remain very few production codes
which have shown better performance.
© 2013, The Exascale Report ™
Pag
e5
When we explore the question of ‘Does Near
Threshold Voltage show promising results for
exascale’, getting to an answer is much more
complex than a simple yes or no as it makes
assumptions as to what user applications will
look like in 5 to 10 years. We know that if our
only requirement was to build a system that
could achieve 1 EF/s for a simple code that we
would be able to do this by the end of the
decade. But we would not want to build a
machine that is not highly usable so the degree
that we push in directions like near threshold
voltage is tempered by this. The long term
answer to this will be that we can operate in
many different domains, we will be able to
operate at very low voltages when the
application can exploit extreme levels of
parallelism and we will be able to also operate in
a mode which is optimal for algorithms that have
far less parallelism. As an architect my job is to
make this as accessible as possible to the user
and where viable, hide this complexity.
When we look at this from a system perspective,
we have to look at many more things – at the
device level – it’s the algorithms. There isn’t any
one answer. There can be multiple answers
depending on your algorithm. Maybe you have
an enormous amount of concurrency and the
frequency doesn’t matter. What really matters
to you is you just want to run nearly an infinite
number of threads – and in that case, Near
Threshold Voltage could be exactly what you are
looking for. On the other hand, there are parts of
algorithms that we typically see that don’t have
that behavior where at least for some time
period, the performance of a single thread is a
limiter. As a result, I think the real answer here is
that we need to make sure we can provide
devices and architectures that will allow us to do
both and use the right one at the right time and
so we are working on techniques to be able to
do that both within a single device and core as
well as in more heterogeneous types of
architectures. And there are again plusses and
minuses in those two approaches. But I think we
need to maintain that level of flexibility in the
architecture because trying to assume that we
can drop the frequency by 10x and still continue
to scale performance would be naïve. While
things are moving quickly in the right direction,
in that perspective, it’s going to take a long time
before frequency doesn’t matter to the majority
of applications.
TER: One industry luminary was recently quoted as saying, “All HPC systems in the future will have GPUs.” Would you agree with this comment? GARA: The industry has a long history of
absorbing things that were one day considered
accelerators into part of the baseline
architecture. One example is the floating point
units. These used to be add-on accelerator
devices much like GPUs are today. So in that
context I would not be surprised to see some
aspects of what we currently think of as GPUs as
baseline features. On the other hand,
accelerators like GPUs have been fairly difficult
for the community to use. They have been
explored in HPC for more than a decade and
there remain very few production codes which
have shown better performance. Some of this is
due to them not being integrated more closely
into the processor. You can see the trend that
GPUs are going to be integrated more tightly
with a processor. I don’t expect that most
© 2013, The Exascale Report ™
Pag
e6
systems will be built with add on cards similar to
how GPUs are configured today; this direction
would likely continue to have power and
performance challenges. Valued
features/concepts will be integrated into a CPU
where it makes sense. As we have more
transistors available in the future, integrating
accelerators is a viable direction that we are
exploring but they need to have enough of an
application reach to justify the silicon area.
If you look at the GPU
roadmap, I think what
you’ll see is GPUs
morphing to CPUs in a
lot of the things they
are doing in trying to
deal with the fact that
they are just too far away from a general
purpose processor.
TER: Raj Hazra recently talked about fabric
integration and the critical importance of Co-
design, and not just in a vertical path, but also as
sideways – or horizontal co-design as being key
to achieving exascale. Is your group responsible
for co-design strategies? What can you tell us
about progress in this area?
GARA: Fabric integration is one of the natural
next steps we need to go to. We need to get to it
to deal with the latencies and performance
levels and cost levels that are necessary if we
want to stay on this exponentially proving curve
of performance. I think we at Intel have made a
great deal of progress in that area – including
some of the recent acquisitions, so we certainly
take this very seriously and I anticipate it to be
part of our roadmap in the future.
Co-design is fundamental to us being able to
build systems that are usable, cost effective and
power efficient. All the systems development
efforts within Intel are very focused on this. We
have made a lot of progress in this area and we
can see it bearing fruit in our products. With the
inevitable technology disruptions that will be
adopted in HPC there is really no other way to
effectively proceed and have any hope that what
we are building makes sense. One big advance
has been engagements
with the government
agencies where we are
now involved in a number
of programs – programs
that provide critical
feedback from the
application experts on
possible architectural directions long before the
technology enters into pathfinding. This is
extremely important to us as we need to make
technology and architecture choices at least 5
years before products are generally available
hence we are designing for application 5-10
years out
TER: A number of spokespeople at Intel feel
confident we will achieve exascale-level
computation by the end of the decade. I assume
you agree with this position, but what area of
technology innovation do you see as a possible
deal breaker?
GARA: Getting to exascale level computing by
the end of the decade is certainly possible but
there are a number of challenges that we still
need to overcome. As I mentioned the biggest of
these is probably energy efficiency. If one
If the US does not aggressively invest
in HPC, the country could find itself
in a very tough position.
© 2013, The Exascale Report ™
Pag
e7
removes the energy efficiency constraint this
becomes much easier to achieve.
There will need to be many innovations to pull
this off. New memory microarchitectures are
absolutely critical. Similarly we will be depending
on continued scaling of our silicon technology.
The last I would mention is silicon photonics.
Supercomputers will be pushing the
communications requirements. Without silicon
photonics the cost of a highly usable system
would likely be prohibitive.
TER: As Intel’s Chief Exascale Architect, what is
your perception of U.S. Technology Leadership
today – and do you think the U.S. has any chance
of being the first nation to field an exascale-class
system?
GARA: It is really the U.S.’s to lose in some
sense. There is no nation that is better
positioned to achieve this. On the other hand we
are seeing enormous investments being made
into this area in many countries. We are in an era
where HPC is really blossoming in terms of
adoption. It is recognized by many countries as
critical to their national competitiveness. A lot
depends on how quickly the US government
responds and how they get behind this. If the US
does not aggressively invest in HPC, the country
could find itself in a very tough position and it
will be much harder to come back to a
technology and computing leadership position.
TER: As a community, what are we doing wrong,
or what could we be doing differently as it relates
to exascale research?
GARA: I think the U.S. emerging exascale
community is like a family – everyone is in this
together. I would not really call out any area
where we are clearly doing something wrong.
There are of course areas where this community
could be doing better, there always will be, but
the emerging Exascale community is a very close
community thriving on very strong and
widespread collaboration. One area where the
HPC community sometimes struggles is with the
definition of “goodness”. The idea of ranking
supercomputers has had a dramatic impact on
helping the community to focus on a concrete
goal. But the time has come to change the way
we measure these systems or we are at risk of
pushing designs into a direction that does not
make sense for general applications. This is being
worked on and it is very important to keep the
community focused but we need a metric that
keeps us all moving in the right direction.
© 2013, The Exascale Report ™
Alan Gara’s bio can be found at:
http://newsroom.intel.com/community/intel_ne
wsroom/bios?n=Alan Gara&f=searchAll