Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
8/28/12
1
CSE 820 Graduate Computer Architecture
Richard Enbody
1st Day 2
Dr. Enbody
8/28/12
2
Why Computer Architecture?
• Improve coding. • Knowledge to make architectural choices. • Ability to understand articles about
architecture. • Hey, this is interesting and cool stuff!
Our SIGGRAPH demo of the ARM Mali-T604 GPU gave a brief preview of Samsung's upcoming Exynos 5 Dual CPU, but now all the details of the company's next great processor are ready for us to view. Other than that GPU which includes support for up to WQXGA (2,560 x 1,600) resolutions -- perfect for the 11.8-inch P10 mentioned in court filings -- and much more, the white paper uncovered by Android Authority also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to 800MHz with a bandwidth of 12.8GBps, USB 3.0 and SATA III. It also claims the horsepower to decode 1080p video at 60fps in pretty much any codec, stereoscopic 3D plus handle graphics APIs like OpenGL ES 3.0 and OpenCL 1.1. All of this is comes courtesy of a dual-core 1.7GHz ARM Cortex-A15 CPU built on the company's 32nm High-K Metal Gate process and Panel Self Refresh technology that avoids changing pixels unnecessarily to reduce power consumption.
8/28/12
3
Important Concepts
Amdahl’s Law Law of diminishing returns in the context of computing.
Important Concepts
Locality
8/28/12
4
Important Concepts
Why we had to move to multicore chips.
Prerequisites
Assume undergraduate computer architecture course such as CSE 420
8/28/12
5
Text
Computer Architecture: A Quantitative Approach Fifth Edition, John L. Hennessy and David A. Patterson, 2012
Flip
Experiment… I want to try to partially “flip” this class:
– Reading is done before class. – Class has more discussion and exercises
than lecture.
8/28/12
6
Grading
55% Exercises 40% Final Exam (Wed., Dec. 12, 7:45 - 9:45 AM) 05% Classroom Participation Course grade:
93% and above is a 4.0; 85% - 92% is a 3.5; 80% - 84% is a 3.0, etc.
Exercises
Somewhat experimental: End of the chapter (weekly?) In-class. Other…
8/28/12
7
Schedule • Week 1 Ch1 Fundamentals (1 class, first week) • Week 2 Ch1 continued (1 class, MLK holiday) • Week 3 Ch1 + Ch2 Memory Hierarchy • Week 4 Ch2 continued • Week 5 Ch3 Instruction-Level Parallelism • Week 6 Ch3 + Ch4 Data-Level Parallelism • Week 7 Ch4 continued • Week 8 Ch5 Thread-Level Parallelism • Week 9 Ch5 + Ch6 Warehouse-Scale Computing • Week 10 Ch6 continued • Week 11 Appendix E (online) Embedded Systems • Week 12 Appendix D (online) Storage Systems • Week 13 Appendix I (online) Large-Scale Multiprocessors • Week 14 Appendix I continued • Week 15 Other
Other? Possibilities
Virtualization support Newest Intel and AMD chips Google architecture Power, Thermal, Skew issues Asynchronous
8/28/12
8
Use some of Hennessy & Patterson’s slides.
Lawrence Livermore IBM Sequoia
8/28/12
9
Lawrence Livermore IBM Sequoia
• Fastest computer in the world (FS12) • LINPACK: 16.32 petaflops per second • highly interconnected cluster of • 1,572,864 cores • 1.6 PetaBytes of RAM, 55 PB storage • 3 Gflops/watt • mounted on 98,304 boards, in 96 racks • across 318 m2 of floor space. • 768 I/O nodes running Linux
Dongarra • "The No. 1 machine — I don't know the exact cost,
but I would guess it's on the order of $200 million US," he said.
• "In the U.S., probably If you were to burn one megawatt for one year, that would cost $1 million US. So if our computer system has 10 MW, it would cost $10 million US to run that computer for one year. That's just the power," he said.
• Sequoia is 7.9 MW so “only” $7.9 million US per year
8/28/12
10
Next
• TACC: 10 PetaFLOP Stampede • Later: Exa-scale • Power Goal
– 1000x performance improvement for – 10x power increase
Intel Aubrey Isle 32-core CPU
8/28/12
11
8/28/12
12
8/28/12
13
8/28/12
14
8/28/12
15
Intel Xeon Phi • 50 cores • 22-nanometer transistors • 8 GB of GDDR5 memory • 1 Tflop • co-processor • Design elements inherited from the Larrabee:
x86 ISA, 512-bit SIMD units, coherent L2 cache, and ultra-wide ring bus connecting processors and memory.
new name for what was Knights Corner
8/28/12
16
Why?
Intel’s response to GPGPU-based supercomputers running CUDA It is all about Flops per Watt
Meanwhile in your pocket … Motorola Atrix 4G NVIDIA Tegra2 processor (40nm) Dual-core ARM Cortex-A9 CPU Out-of-order processing 1080p HDTV - HDMI 1 GHz L2 cache 1 MB (shared?) L1 32KB I & 32KB D per core 1 GB memory 8-core GPU 12 MP camera with 16X zoom
8/28/12
17
In your pocket
• Galaxy S111 phone • Samsung 1.4 GHz Exynos 4 Quad
processor
Our SIGGRAPH demo of the ARM Mali-T604 GPU gave a brief preview of Samsung's upcoming Exynos 5 Dual CPU, but now all the details of the company's next great processor are ready for us to view. Other than that GPU which includes support for up to WQXGA (2,560 x 1,600) resolutions -- perfect for the 11.8-inch P10 mentioned in court filings -- and much more, the white paper uncovered by Android Authority also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to 800MHz with a bandwidth of 12.8GBps, USB 3.0 and SATA III. It also claims the horsepower to decode 1080p video at 60fps in pretty much any codec, stereoscopic 3D plus handle graphics APIs like OpenGL ES 3.0 and OpenCL 1.1. All of this is comes courtesy of a dual-core 1.7GHz ARM Cortex-A15 CPU built on the company's 32nm High-K Metal Gate process and Panel Self Refresh technology that avoids changing pixels unnecessarily to reduce power consumption.
8/28/12
18
Algorithms A benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988. Fifteen years later (2003) it could be solved in roughly 1 minute, an improvement of roughly 43 million. Roughly 1,000 was due to increased processor speed; a factor of roughly 43,000 was due to improvements in algorithms!
Professor Martin Grötschel Konrad-Zuse-Zentrum für Informationstechnik Berlin.
What Computer Architecture brings to Table • Other fields often borrow ideas from architecture • Quantitative Principles of Design
1. Take Advantage of Parallelism 2. Principle of Locality 3. Focus on the Common Case 4. Amdahl’s Law 5. The Processor Performance Equation
• Careful, quantitative comparisons – Define, quantity, and summarize relative performance – Define and quantity relative cost – Define and quantity dependability – Define and quantity power
• Culture of anticipating and exploiting advances in technology
• Culture of well-defined interfaces that are carefully implemented and thoroughly checked