Exponential growth of ICT: how long can it last - mii.lt · PDF fileExample: High Performance ... –Latest IBM HPC delivers 16 petaflops •Heading for exaflop computers ... Drivers

Exponential growth of ICT: how long can it last

Arne Sølvberg NTNU-Norwegian University of Science and Technology

Dept. Computer & Information Sciences Trondheim

Norway

1 Arne Sølvberg, Vilnius July 2012

Example: High Performance Computers at NTNU 1986-2012

• 1986: 0.5 Gigaflops, 80 mill NOK →~ 160 mill NOK/Gflops

• 2006: 7.5 Teraflops, 30 mill NOK →~ 4000 NOK/Gflops

• 2012: 0.5 Petaflops, 55 mill NOK →~ 110 NOK/Gflops – (110 NOK ~ 15 €)

• Performance/price increases by 1 mill over 26 years

• Performance/price doubles every ~15 month

• (Moore’s law: #transistors/area doubles every 18 months)

• How long can this continue??


Computers 2012 characteristics

• Transistor size approx 30nm

• Clock frequency ~ 2-4 GHz

• > 100 mill transistors/chip

• Peak computer performance is 1-10 petaflops

– Latest IBM HPC delivers 16 petaflops

• Heading for exaflop computers

• Energy consumption is a stumbling block


Digital devices:

From vacuum tubes to integrated circuits

• 1940-50 Vacuum tubes (very high heat production)

• 1949-59 Transistors (much less heat production)

• 1954 First silicon transistor (even less heat prod.)

• 1960 First MOS transistor – MOS = Metal Oxide Semiconductor (nMOS, pMOS)

• 1976 First CMOS transistor (= Combined MOS) – Combination of nMOS and pMOS

– Generates heat only when gate is active

• Today: Replacement of silicon based CMOS??????


Electricity emerges as a major cost item for running a normal server

• 2007: servers consumed about 1.5 percent of the total electricity generated in the United States in 2006

• 2012: electricity costs comparable to hardware costs over a servers life-time

• electricity use may become a primary limiting factor in the growth of aggregate computing performance

• server energy use was in 2007 expected to double every 4-5 years

• 2016: server electricity use in USA expected (in 2007 forecast) to surpass 5 percent of the total U.S. generating capacity

• Green Computing is important for IT costs as well as for the environment


Situation description & relevant questions

• Computers are everywhere

• Humans, organizations and computers interact tightly and become increasingly interdependent

• Exponential growth in ICT performance/price is the rule-of-the-game

• What are the limits to growth?

• Which are the “drivers” of ICT development?

• Which system components are central for understanding future ICT developments?


Electricity and computers

1897

1912

1932

~1962

1982

1992

2012

1886 First induction motor (Tesla)

First indoor lighting of building (Library of Congress, Edison)

My grandfather’s farm got electric lighting

1864 Maxwell’s theory of electromagnetism

Electricity had existed “forever” The Turing machine theory is published

~1945

------ 15 years ----

1953

von Neumann’s report on the EDVAC

First commercial computer

Widespread public computer centers

----- 15-20 years ----

PC arrives, personal computing for the masses For those born after 1980: Computers have existed “forever”

Internet takes off

2-3 year olds play with smart phones and tablet computers Computers and internet have existed “forever”

-----1912 50 years ------1962 50 years ------2012

------1864 50 years


Drivers of ICT development

• ICT increases human productivity (40-50% of GNP growth due to applications of ICT)

• Exponential growth of computers’ performance • 50% Bigger Bang/year for electronics • Necessary for ”the virtuous cycle”: Performance increases lead to

rapid replacement of computers which finances technical development which produces exponential performance increases, and so on and so on

• (60-70% Bigger Bang/Buck/year for computers)

• ICT systems fit better and better to human ”body and soul” • Body: tighter interaction between body and computers, e.g.,

computer implants, touch-pads, multi-touch screens • Soul: Higher abstraction levels for problem solution (modeling and

programming)


Performance increases made old software run faster on new computers than on old ones

• Computers were built to achieve almost total isolation between software and hardware in order to provide software portability

• Reprogramming was not necessary when acquiring new hardware – Cheaper to reuse old software than to reprogram in order to take

advantage of increased computational speed – New functions could be supported without extra costs

• Windows OS grew from 5 mill lines of code in 1993 to 50 mill in 2003 • Linux OS grew from 59 mill lines of code in 2000 to 273 mill in 2007

• Consequence: ICT systems have been long-lived – Old software is modified, not replaced – Software complexity increases with time – Competence on old software fades away – Danger of systems failure and crashes increases


Single-processor performance from 1986 to 2008 as measured by the benchmark suite SPECint2000

100,000

10,000

1,000

100

10

1

National Research Council, The Future of Computing Performance: Game Over or Next Level? Nat’l Academies Press, 2010 10 Arne Sølvberg, Vilnius July 2012

1985 1990 1995 2000 2005 2010

10,000

1,000

100

10

1

Integer application performance (SPECint2000) over time (1985-2010).

National Research Council, The Future of Computing Performance: Game Over or Next Level? Nat’l Academies Press, 2010


Hardware performance is not increasing as fast as before!

• After 2004 exponential performance growth seems to have reached a ceiling

• Price/performance is still improving exponentially, but how long will this last?

• Is it ”Game over”?

• Heat generation is a major stumbling block


Two ways of viewing ”performance”

• Time-to-solution – Time from start of computation until result is produced

– Example: weather forecast must be completed prior to weather

• Through-put – Number of solved tasks/time unit

– Example: web-search, many independent tasks

• Performance problems usually come in combinations of the two, for example: web search requires – max time-to solution of less than a second

– large through-put measured in number of independent queries processed/second


Short course in computer design

Some important challenges:

• Increase performance ~50% annually and not generate more power than air cooling can handle

– 130 W per chip for PC

– 3 W for handheld equipment (mobile phones)

• How to bring relevant data to the processor quick enough to avoid waiting/idle processor

• Less than 1% of transistors on a chip are active simultaneously. How to increase this?


CPU Clocking • Operation of digital hardware is governed by a

constant-rate clock

Clock (cycles)

Data transfer and computation

Update state

Clock period

Performance increases by maximizing the number of instructions within a “tick”

Instructions are from a standardized repertoire dating back to the x86 instruction set (Intel 8086 CPU from 1978)

Electronic devices modulate current

by changing an energy barrier

Digital device

Input Output

Vsupply

Vgrd

Devices operate between voltage levels Vsupply and Vgrd Vsupply represents the energy barrier that electrons must “jump” in order to provide current through the device, that is, for the device to provide an output signal, currently ~ 1.2 V Vgrd represents the energy barrier that prevents electron leakage when device is inactive, Must be >> average thermal energy of electrons = kT, energy barrier >> kT/e ~ 26.3 mV in room temperature, that is, at least 0.2-0.3 V

All movements of electrons generate heat 16 Arne Sølvberg, Vilnius July 2012

A fundamental problem of electronics

• Vgrd does not scale, so it is hard to scale the supply voltage

• The limitation on Vgrd is set by leakage of carriers over an energy barrier

• In room temperature Vgrd >> kT/e ~ 26 mV – k = Boltzman’s constant, T = degr Kelvin, e = electron charge

• Contemporary digital circuits operate with – Vgrd > 100-300 mV

– Vsupply~1-1.2 V


Two major data transport challenges

• Preventing CPU to wait for data in order to process – Multiplication A←B x C requires transport of B and C to the CPU and A

from the CPU after multiplication – Transport time 2-4 clock ticks across the chip but 50+ ticks from

outside the chip – Critical issue: to transport needed data to CPU prior to use (while CPU

is busy with other calculations)

• Preventing data to be transported too far (because of energy used) – Heat is generated when data is transported – For 40 nm technology one CPU operation generates 100 pJ – Data transport (64 bit words) generates 12.8 pJ/wire-mm – Data transport from outside the chip generates 10 times as much

(approx 130 pJ) – If not careful data transport will generate the lions share of the heat


The heat production dilemma

• Every computation requires electric energy • Energy becomes heat which must be removed • The cost of providing and removing energy must be less than

the benefit of increasing the computational performance • Air cooling is the cheapest → max 130 W/chip • Liquid cooling can be used for HPC’s but not for PC’s, smart

phones and moderately sized servers, e.g., what happens when the cooling liquid leaks?

• For single processor computers the performance limits were reached around 2004

• And then what? How to continue along the exponential growth path?


Clock rate dominates the energy generation in contemporary CMOS

• Energy generation is determined by the formula

Power ~ (number of gates)(CLoad/gate)(Clock Rate)(Vsupply 2)

• Energy generation could be kept constant by lowering Vsupply as the number of gates increased (remember Moore’s law)

• Increases in clock rate soon outpaced all other design parameters, and could not be compensated for by changing the other design parameters

• Max. obtainable clock rate increases with increasing Vsupply

• In the marginal case we may have Power ~ (Clock rate)3


Multi-core chips

• Pollack’s rule: Doubling the hardware generated only about a 50 percent increase in performance (approx 1990)

• Led to (early 1990’s) performance increases through frequency increases, around 2000 this created a serious heat problem

• The relation Power~ (clock rate)3 added to the problem • Multi-core chips provided a way out of the heat dilemma: many

processors on the same chip cooperate in providing better performance, at a lower clock rate than for a single-processor chip

• A chip with many cores will produce same heat as a single processor chip, provided that the clock rate is the same

• If several cores can be made to cooperate the Performance/Watt may increase many-fold compared to a single core chip

• Exploiting parallelism became a key issue


Fuller&Millett: «Computing Performance: Game over or next level» IEEE Computer, January 2012, pp 31-38


Fuller&Millett: «Computing Performance: Game over or next level» IEEE Computer, January 2012, pp 31-38


Inside the Processor • AMD Barcelona: 4 processor cores (4 separate computers) • Two caches pr computer (L1 and L2) • One shared cache (L3)


An Experiment in Multi-core efficiency

Lien et al.: Case Studies of Multi-core Energy efficiency in Task Based Programs

Proc. HiPEAC3 – Gøteborg 24/4-2012 25 Arne Sølvberg, Vilnius July 2012

Compute intensive program

• CEPIS - paper



Data intensive program



Multi-core HW/SW interplay and energy efficiency — examples and ideas

Lasse Natvig CARD group, Dept. of comp.sci. (IDI) - NTNU & HPC-section – NTNU HiPEAC3 – Gøteborg 24/4-2012

http://research.idi.ntnu.no/multicore 28 Arne Sølvberg, Vilnius July 2012

Three major paths to improving price/performance ratio

• New computer architectures – More effective utilization of silicon based circuits, e.g.,

multi-core chips

– Special purpose computers, e.g. graphics processors

– Quantum computers, bio-computers, nano-computers

• New computer devices – Quantum devices, nano-scale devices

– Optical devices

– Bio devices

• Software/languages for parallel programming


New Computer architectures: Special purpose vs. general purpose computer designs

• General purpose computers – Can be programmed – Can be debugged – Are based on standard electronics and produced in vast

quantities

• Special purpose computers – Have much more logic hardwired in the electronic circuitry – Use much less energy than general purpose computers – Must be produced in vast quantities to offset very high

engineering costs for producing required electronic s

• Market considerations determine the balance between special purpose and general purpose solutions


New Devices built on new principles: Quantum effects in next generation?

• E.g., tunneling field effect transistor – the gate is controlled by quantum tunneling rather than by

thermal injection – reducing gate voltage from ~1 volt to 0.2 volts – reducing power consumption by up to 100x.

• Optical logical gates to replace electronic circuitry? Probably no energy savings replacing electrons with photons

• Nano-materials? Graphene based transistors and logical gates are reported to work in laboratory scale (summer of 2011)

• Bio devices: Can we learn from the life-sciences and build computers based on DNA principles?


Parallel processing is not new, but is now more important than before

• Search engine design – How can Google deliver search results for all of the world’s data in a

fraction of a second? • Distribute data on a large number of computers • Parallel search on all of the computers simultaneously

– Large number of search questions can be processed simultaneously

• Large scale simulations – Matrix manipulation, parallellisation

• 1994 proposal for Fortran extension for parallel programming (of matrix operations) – Lacked good diagnostic tools & lacked portability – Did not fly

• 2009 proposal for Coarray Fortran 32 Arne Sølvberg, Vilnius July 2012

Amdahl’s Law Amdahl’s law sets the limit to which a parallel program can be sped up •Programs can be thought of as containing one or more parallel sections of code that can be sped up with suitably parallel hardware and a sequential section that cannot be sped up. •The faster the parallel section of the code run, the more the remaining sequential code looms as the performance bottleneck. •In the limit, if the parallel section is responsible for 80 percent of the run time, and that section is sped up infinitely (so that it runs in zero time), the other 20 percent now constitutes the entire run time. •It would therefore have been sped up by a factor of 5, but after that no amount of additional parallel hardware will make it go any faster.

•Amdahl’s law is Speedup = 1/[(1 – P) + P/N)] P is the proportion of the code that runs in parallel and N is the number of processors.


Pitfall: Amdahl’s Law • Improving one aspect of a computer and expecting

a proportional improvement in overall performance

2080

20n

Can’t be done!

unaffectedaffected

improved Tfactor timprovemen

TT

Example: multiply accounts for 80s/100s

How much improvement in multiply performance to get 5× overall if 80% of work can be parallelized?

Corollary: make the common case fast


Challenges to increasing performance and energy

efficiency through parallelism

• Finding independent operations.

• Communicating between operations,

• Preserving locality between operations

• Synchronizing operations.

• Balancing the load represented by the operations among the system resources.


Some characteristics of contemporary high-performance computers (HPC)

• NTNU’s new anno 2012 high-performance computer (Vilje) – 24 000 processor cores, 8 cores/chip, 2 chips/node, 2.6 GHz – 470 Teraflops ~ ½ Petaflop – Total cost is 55 mill NOK→15€/Gflops – Power consumption 450 KW + 350 KW for cooling ~ average personal

electricity use of 1000 Norwegians (heating, hot water, electrical appliances)

• NTNU’s previous HPC (Njord) was installed in 2006 – 7.5 Tflop in 2006, – Total cost 30 mill NOK → 4000 NOK/Gflops (~500€/Gflops) – Power consumption 150 KW

• Over 6 years (back-of-the-envelope calculations) – Price/Gflops has decreased by 97.5% → doubling performance/price every

12 months over 6 years, every 15 months over 26 years – Power consumption/Gflop has decreased by 95+% → half time ~ 15-16

months


Recapture of heat energy

• Nearly 100% of the electrical energy for HPC computing may be recaptured as hot water

• If the hot water can be used for heating buildings in cold climates the energy costs for computing can be greatly decreased

• High Performance Computers should be situated close to garbage collection facilities where garbage is burnt to produce hot water heating?


Exa-scale computers?

• Exponential performance/price growth is still very much alive

• Exa-scale computers may become heat generating monsters assuming current computer design technology and no recapture of energy – 1 exaflop = 1000 petaflop ~ 2000 current NTNU HPC – Current technology requires 2000x400KW/Exaflops ~ 7 TWh/year – Assuming half-time of 15 months for power consumption, an Exaflop

computer will need • in 2020: ~ 100 GWh/year • in 2028 : ~ 13 GWh/year

• Exa-scale computers are within reach if current computer

development continues!


Game over?

• Exponential growth in peak performance computing will probably continue for next 10 years using silicon technology: Moore’s law is still very much alive, at least down to 10-12 nm

• Increased use of special purpose processors will ensure exponential growth beyond that

• Parallel programming will emerge on a larger scale when necessary, and contribute to exponential growth beyond next 10 years

• New devices built on new materials will most probably emerge within next 10 years 39 Arne Sølvberg, Vilnius July 2012

Myhrvold’s “laws of software”

In 1997, Nathan Myhrvold, former chief technology officer for Microsoft, postulated four “laws of software”:

• 1. Software is a gas. Software always expands to fit whatever

container it is stored in. • 2. Software grows until it becomes limited by Moore’s law. The

growth of software is initially rapid, like gas expanding, but is inevitably limited by the rate of increase in hardware speed.

• 3. Software growth makes Moore’s law possible. People buy new hardware because the software requires it.

• 4. Software is limited only by human ambition and expectation. We will always find new algorithms, new applications, and new users


Thanks for listening


Documents

Exponential growth of ICT: how long can it last - mii.lt · PDF fileExample: High Performance ... –Latest IBM HPC delivers 16 petaflops •Heading for exaflop computers ... Drivers