Upload
tranlien
View
216
Download
2
Embed Size (px)
Citation preview
Exponential growth of ICT: how long can it last
Arne Sølvberg NTNU-Norwegian University of Science and Technology
Dept. Computer & Information Sciences Trondheim
Norway
1 Arne Sølvberg, Vilnius July 2012
Example: High Performance Computers at NTNU 1986-2012
• 1986: 0.5 Gigaflops, 80 mill NOK →~ 160 mill NOK/Gflops
• 2006: 7.5 Teraflops, 30 mill NOK →~ 4000 NOK/Gflops
• 2012: 0.5 Petaflops, 55 mill NOK →~ 110 NOK/Gflops – (110 NOK ~ 15 €)
• Performance/price increases by 1 mill over 26 years
• Performance/price doubles every ~15 month
• (Moore’s law: #transistors/area doubles every 18 months)
• How long can this continue??
2 Arne Sølvberg, Vilnius July 2012
Computers 2012 characteristics
• Transistor size approx 30nm
• Clock frequency ~ 2-4 GHz
• > 100 mill transistors/chip
• Peak computer performance is 1-10 petaflops
– Latest IBM HPC delivers 16 petaflops
• Heading for exaflop computers
• Energy consumption is a stumbling block
3 Arne Sølvberg, Vilnius July 2012
Digital devices:
From vacuum tubes to integrated circuits
• 1940-50 Vacuum tubes (very high heat production)
• 1949-59 Transistors (much less heat production)
• 1954 First silicon transistor (even less heat prod.)
• 1960 First MOS transistor – MOS = Metal Oxide Semiconductor (nMOS, pMOS)
• 1976 First CMOS transistor (= Combined MOS) – Combination of nMOS and pMOS
– Generates heat only when gate is active
• Today: Replacement of silicon based CMOS??????
4 Arne Sølvberg, Vilnius July 2012
Electricity emerges as a major cost item for running a normal server
• 2007: servers consumed about 1.5 percent of the total electricity generated in the United States in 2006
• 2012: electricity costs comparable to hardware costs over a servers life-time
• electricity use may become a primary limiting factor in the growth of aggregate computing performance
• server energy use was in 2007 expected to double every 4-5 years
• 2016: server electricity use in USA expected (in 2007 forecast) to surpass 5 percent of the total U.S. generating capacity
• Green Computing is important for IT costs as well as for the environment
5 Arne Sølvberg, Vilnius July 2012
Situation description & relevant questions
• Computers are everywhere
• Humans, organizations and computers interact tightly and become increasingly interdependent
• Exponential growth in ICT performance/price is the rule-of-the-game
• What are the limits to growth?
• Which are the “drivers” of ICT development?
• Which system components are central for understanding future ICT developments?
6 Arne Sølvberg, Vilnius July 2012
Electricity and computers
1897
1912
1932
~1962
1982
1992
2012
1886 First induction motor (Tesla)
First indoor lighting of building (Library of Congress, Edison)
My grandfather’s farm got electric lighting
1864 Maxwell’s theory of electromagnetism
Electricity had existed “forever” The Turing machine theory is published
~1945
------ 15 years ----
1953
von Neumann’s report on the EDVAC
First commercial computer
Widespread public computer centers
----- 15-20 years ----
PC arrives, personal computing for the masses For those born after 1980: Computers have existed “forever”
Internet takes off
2-3 year olds play with smart phones and tablet computers Computers and internet have existed “forever”
-----1912 50 years ------1962 50 years ------2012
------1864 50 years
7 Arne Sølvberg, Vilnius July 2012
Drivers of ICT development
• ICT increases human productivity (40-50% of GNP growth due to applications of ICT)
• Exponential growth of computers’ performance • 50% Bigger Bang/year for electronics • Necessary for ”the virtuous cycle”: Performance increases lead to
rapid replacement of computers which finances technical development which produces exponential performance increases, and so on and so on
• (60-70% Bigger Bang/Buck/year for computers)
• ICT systems fit better and better to human ”body and soul” • Body: tighter interaction between body and computers, e.g.,
computer implants, touch-pads, multi-touch screens • Soul: Higher abstraction levels for problem solution (modeling and
programming)
8 Arne Sølvberg, Vilnius July 2012
Performance increases made old software run faster on new computers than on old ones
• Computers were built to achieve almost total isolation between software and hardware in order to provide software portability
• Reprogramming was not necessary when acquiring new hardware – Cheaper to reuse old software than to reprogram in order to take
advantage of increased computational speed – New functions could be supported without extra costs
• Windows OS grew from 5 mill lines of code in 1993 to 50 mill in 2003 • Linux OS grew from 59 mill lines of code in 2000 to 273 mill in 2007
• Consequence: ICT systems have been long-lived – Old software is modified, not replaced – Software complexity increases with time – Competence on old software fades away – Danger of systems failure and crashes increases
9 Arne Sølvberg, Vilnius July 2012
Single-processor performance from 1986 to 2008 as measured by the benchmark suite SPECint2000
100,000
10,000
1,000
100
10
1
National Research Council, The Future of Computing Performance: Game Over or Next Level? Nat’l Academies Press, 2010 10 Arne Sølvberg, Vilnius July 2012
1985 1990 1995 2000 2005 2010
10,000
1,000
100
10
1
Integer application performance (SPECint2000) over time (1985-2010).
National Research Council, The Future of Computing Performance: Game Over or Next Level? Nat’l Academies Press, 2010
11 Arne Sølvberg, Vilnius July 2012
Hardware performance is not increasing as fast as before!
• After 2004 exponential performance growth seems to have reached a ceiling
• Price/performance is still improving exponentially, but how long will this last?
• Is it ”Game over”?
• Heat generation is a major stumbling block
12 Arne Sølvberg, Vilnius July 2012
Two ways of viewing ”performance”
• Time-to-solution – Time from start of computation until result is produced
– Example: weather forecast must be completed prior to weather
• Through-put – Number of solved tasks/time unit
– Example: web-search, many independent tasks
• Performance problems usually come in combinations of the two, for example: web search requires – max time-to solution of less than a second
– large through-put measured in number of independent queries processed/second
13 Arne Sølvberg, Vilnius July 2012
Short course in computer design
Some important challenges:
• Increase performance ~50% annually and not generate more power than air cooling can handle
– 130 W per chip for PC
– 3 W for handheld equipment (mobile phones)
• How to bring relevant data to the processor quick enough to avoid waiting/idle processor
• Less than 1% of transistors on a chip are active simultaneously. How to increase this?
14 Arne Sølvberg, Vilnius July 2012
CPU Clocking • Operation of digital hardware is governed by a
constant-rate clock
Clock (cycles)
Data transfer and computation
Update state
Clock period
Performance increases by maximizing the number of instructions within a “tick”
Instructions are from a standardized repertoire dating back to the x86 instruction set (Intel 8086 CPU from 1978)
Electronic devices modulate current
by changing an energy barrier
Digital device
Input Output
Vsupply
Vgrd
Devices operate between voltage levels Vsupply and Vgrd Vsupply represents the energy barrier that electrons must “jump” in order to provide current through the device, that is, for the device to provide an output signal, currently ~ 1.2 V Vgrd represents the energy barrier that prevents electron leakage when device is inactive, Must be >> average thermal energy of electrons = kT, energy barrier >> kT/e ~ 26.3 mV in room temperature, that is, at least 0.2-0.3 V
All movements of electrons generate heat 16 Arne Sølvberg, Vilnius July 2012
A fundamental problem of electronics
• Vgrd does not scale, so it is hard to scale the supply voltage
• The limitation on Vgrd is set by leakage of carriers over an energy barrier
• In room temperature Vgrd >> kT/e ~ 26 mV – k = Boltzman’s constant, T = degr Kelvin, e = electron charge
• Contemporary digital circuits operate with – Vgrd > 100-300 mV
– Vsupply~1-1.2 V
17 Arne Sølvberg, Vilnius July 2012
Two major data transport challenges
• Preventing CPU to wait for data in order to process – Multiplication A←B x C requires transport of B and C to the CPU and A
from the CPU after multiplication – Transport time 2-4 clock ticks across the chip but 50+ ticks from
outside the chip – Critical issue: to transport needed data to CPU prior to use (while CPU
is busy with other calculations)
• Preventing data to be transported too far (because of energy used) – Heat is generated when data is transported – For 40 nm technology one CPU operation generates 100 pJ – Data transport (64 bit words) generates 12.8 pJ/wire-mm – Data transport from outside the chip generates 10 times as much
(approx 130 pJ) – If not careful data transport will generate the lions share of the heat
18 Arne Sølvberg, Vilnius July 2012
The heat production dilemma
• Every computation requires electric energy • Energy becomes heat which must be removed • The cost of providing and removing energy must be less than
the benefit of increasing the computational performance • Air cooling is the cheapest → max 130 W/chip • Liquid cooling can be used for HPC’s but not for PC’s, smart
phones and moderately sized servers, e.g., what happens when the cooling liquid leaks?
• For single processor computers the performance limits were reached around 2004
• And then what? How to continue along the exponential growth path?
19 Arne Sølvberg, Vilnius July 2012
Clock rate dominates the energy generation in contemporary CMOS
• Energy generation is determined by the formula
Power ~ (number of gates)(CLoad/gate)(Clock Rate)(Vsupply 2)
• Energy generation could be kept constant by lowering Vsupply as the number of gates increased (remember Moore’s law)
• Increases in clock rate soon outpaced all other design parameters, and could not be compensated for by changing the other design parameters
• Max. obtainable clock rate increases with increasing Vsupply
• In the marginal case we may have Power ~ (Clock rate)3
20 Arne Sølvberg, Vilnius July 2012
Multi-core chips
• Pollack’s rule: Doubling the hardware generated only about a 50 percent increase in performance (approx 1990)
• Led to (early 1990’s) performance increases through frequency increases, around 2000 this created a serious heat problem
• The relation Power~ (clock rate)3 added to the problem • Multi-core chips provided a way out of the heat dilemma: many
processors on the same chip cooperate in providing better performance, at a lower clock rate than for a single-processor chip
• A chip with many cores will produce same heat as a single processor chip, provided that the clock rate is the same
• If several cores can be made to cooperate the Performance/Watt may increase many-fold compared to a single core chip
• Exploiting parallelism became a key issue
21 Arne Sølvberg, Vilnius July 2012
Fuller&Millett: «Computing Performance: Game over or next level» IEEE Computer, January 2012, pp 31-38
22 Arne Sølvberg, Vilnius July 2012
Fuller&Millett: «Computing Performance: Game over or next level» IEEE Computer, January 2012, pp 31-38
23 Arne Sølvberg, Vilnius July 2012
Inside the Processor • AMD Barcelona: 4 processor cores (4 separate computers) • Two caches pr computer (L1 and L2) • One shared cache (L3)
24 Arne Sølvberg, Vilnius July 2012
An Experiment in Multi-core efficiency
Lien et al.: Case Studies of Multi-core Energy efficiency in Task Based Programs
Proc. HiPEAC3 – Gøteborg 24/4-2012 25 Arne Sølvberg, Vilnius July 2012
Compute intensive program
• CEPIS - paper
Lien et al.: Case Studies of Multi-core Energy efficiency in Task Based Programs
Proc. HiPEAC3 – Gøteborg 24/4-2012 26 Arne Sølvberg, Vilnius July 2012
Data intensive program
Lien et al.: Case Studies of Multi-core Energy efficiency in Task Based Programs
Proc. HiPEAC3 – Gøteborg 24/4-2012 27 Arne Sølvberg, Vilnius July 2012
Multi-core HW/SW interplay and energy efficiency — examples and ideas
Lasse Natvig CARD group, Dept. of comp.sci. (IDI) - NTNU & HPC-section – NTNU HiPEAC3 – Gøteborg 24/4-2012
http://research.idi.ntnu.no/multicore 28 Arne Sølvberg, Vilnius July 2012
Three major paths to improving price/performance ratio
• New computer architectures – More effective utilization of silicon based circuits, e.g.,
multi-core chips
– Special purpose computers, e.g. graphics processors
– Quantum computers, bio-computers, nano-computers
• New computer devices – Quantum devices, nano-scale devices
– Optical devices
– Bio devices
• Software/languages for parallel programming
29 Arne Sølvberg, Vilnius July 2012
New Computer architectures: Special purpose vs. general purpose computer designs
• General purpose computers – Can be programmed – Can be debugged – Are based on standard electronics and produced in vast
quantities
• Special purpose computers – Have much more logic hardwired in the electronic circuitry – Use much less energy than general purpose computers – Must be produced in vast quantities to offset very high
engineering costs for producing required electronic s
• Market considerations determine the balance between special purpose and general purpose solutions
30 Arne Sølvberg, Vilnius July 2012
New Devices built on new principles: Quantum effects in next generation?
• E.g., tunneling field effect transistor – the gate is controlled by quantum tunneling rather than by
thermal injection – reducing gate voltage from ~1 volt to 0.2 volts – reducing power consumption by up to 100x.
• Optical logical gates to replace electronic circuitry? Probably no energy savings replacing electrons with photons
• Nano-materials? Graphene based transistors and logical gates are reported to work in laboratory scale (summer of 2011)
• Bio devices: Can we learn from the life-sciences and build computers based on DNA principles?
31 Arne Sølvberg, Vilnius July 2012
Parallel processing is not new, but is now more important than before
• Search engine design – How can Google deliver search results for all of the world’s data in a
fraction of a second? • Distribute data on a large number of computers • Parallel search on all of the computers simultaneously
– Large number of search questions can be processed simultaneously
• Large scale simulations – Matrix manipulation, parallellisation
• 1994 proposal for Fortran extension for parallel programming (of matrix operations) – Lacked good diagnostic tools & lacked portability – Did not fly
• 2009 proposal for Coarray Fortran 32 Arne Sølvberg, Vilnius July 2012
Amdahl’s Law Amdahl’s law sets the limit to which a parallel program can be sped up •Programs can be thought of as containing one or more parallel sections of code that can be sped up with suitably parallel hardware and a sequential section that cannot be sped up. •The faster the parallel section of the code run, the more the remaining sequential code looms as the performance bottleneck. •In the limit, if the parallel section is responsible for 80 percent of the run time, and that section is sped up infinitely (so that it runs in zero time), the other 20 percent now constitutes the entire run time. •It would therefore have been sped up by a factor of 5, but after that no amount of additional parallel hardware will make it go any faster.
•Amdahl’s law is Speedup = 1/[(1 – P) + P/N)] P is the proportion of the code that runs in parallel and N is the number of processors.
33 Arne Sølvberg, Vilnius July 2012
Pitfall: Amdahl’s Law • Improving one aspect of a computer and expecting
a proportional improvement in overall performance
2080
20n
Can’t be done!
unaffectedaffected
improved Tfactor timprovemen
TT
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to get 5× overall if 80% of work can be parallelized?
Corollary: make the common case fast
34 Arne Sølvberg, Vilnius July 2012
Challenges to increasing performance and energy
efficiency through parallelism
• Finding independent operations.
• Communicating between operations,
• Preserving locality between operations
• Synchronizing operations.
• Balancing the load represented by the operations among the system resources.
35 Arne Sølvberg, Vilnius July 2012
Some characteristics of contemporary high-performance computers (HPC)
• NTNU’s new anno 2012 high-performance computer (Vilje) – 24 000 processor cores, 8 cores/chip, 2 chips/node, 2.6 GHz – 470 Teraflops ~ ½ Petaflop – Total cost is 55 mill NOK→15€/Gflops – Power consumption 450 KW + 350 KW for cooling ~ average personal
electricity use of 1000 Norwegians (heating, hot water, electrical appliances)
• NTNU’s previous HPC (Njord) was installed in 2006 – 7.5 Tflop in 2006, – Total cost 30 mill NOK → 4000 NOK/Gflops (~500€/Gflops) – Power consumption 150 KW
• Over 6 years (back-of-the-envelope calculations) – Price/Gflops has decreased by 97.5% → doubling performance/price every
12 months over 6 years, every 15 months over 26 years – Power consumption/Gflop has decreased by 95+% → half time ~ 15-16
months
36 Arne Sølvberg, Vilnius July 2012
Recapture of heat energy
• Nearly 100% of the electrical energy for HPC computing may be recaptured as hot water
• If the hot water can be used for heating buildings in cold climates the energy costs for computing can be greatly decreased
• High Performance Computers should be situated close to garbage collection facilities where garbage is burnt to produce hot water heating?
37 Arne Sølvberg, Vilnius July 2012
Exa-scale computers?
• Exponential performance/price growth is still very much alive
• Exa-scale computers may become heat generating monsters assuming current computer design technology and no recapture of energy – 1 exaflop = 1000 petaflop ~ 2000 current NTNU HPC – Current technology requires 2000x400KW/Exaflops ~ 7 TWh/year – Assuming half-time of 15 months for power consumption, an Exaflop
computer will need • in 2020: ~ 100 GWh/year • in 2028 : ~ 13 GWh/year
• Exa-scale computers are within reach if current computer
development continues!
38 Arne Sølvberg, Vilnius July 2012
Game over?
• Exponential growth in peak performance computing will probably continue for next 10 years using silicon technology: Moore’s law is still very much alive, at least down to 10-12 nm
• Increased use of special purpose processors will ensure exponential growth beyond that
• Parallel programming will emerge on a larger scale when necessary, and contribute to exponential growth beyond next 10 years
• New devices built on new materials will most probably emerge within next 10 years 39 Arne Sølvberg, Vilnius July 2012
Myhrvold’s “laws of software”
In 1997, Nathan Myhrvold, former chief technology officer for Microsoft, postulated four “laws of software”:
• 1. Software is a gas. Software always expands to fit whatever
container it is stored in. • 2. Software grows until it becomes limited by Moore’s law. The
growth of software is initially rapid, like gas expanding, but is inevitably limited by the rate of increase in hardware speed.
• 3. Software growth makes Moore’s law possible. People buy new hardware because the software requires it.
• 4. Software is limited only by human ambition and expectation. We will always find new algorithms, new applications, and new users
40 Arne Sølvberg, Vilnius July 2012
Thanks for listening
41 Arne Sølvberg, Vilnius July 2012