18
8/28/12 1 CSE 820 Graduate Computer Architecture Richard Enbody 1 st Day 2 Dr. Enbody

Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

1

CSE 820 Graduate Computer Architecture

Richard Enbody

1st Day 2

Dr. Enbody

Page 2: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

2

Why Computer Architecture?

•  Improve coding. •  Knowledge to make architectural choices. •  Ability to understand articles about

architecture. •  Hey, this is interesting and cool stuff!

Our SIGGRAPH demo of the ARM Mali-T604 GPU gave a brief preview of Samsung's upcoming Exynos 5 Dual CPU, but now all the details of the company's next great processor are ready for us to view. Other than that GPU which includes support for up to WQXGA (2,560 x 1,600) resolutions -- perfect for the 11.8-inch P10 mentioned in court filings -- and much more, the white paper uncovered by Android Authority also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to 800MHz with a bandwidth of 12.8GBps, USB 3.0 and SATA III. It also claims the horsepower to decode 1080p video at 60fps in pretty much any codec, stereoscopic 3D plus handle graphics APIs like OpenGL ES 3.0 and OpenCL 1.1. All of this is comes courtesy of a dual-core 1.7GHz ARM Cortex-A15 CPU built on the company's 32nm High-K Metal Gate process and Panel Self Refresh technology that avoids changing pixels unnecessarily to reduce power consumption.

Page 3: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

3

Important Concepts

Amdahl’s Law Law of diminishing returns in the context of computing.

Important Concepts

Locality

Page 4: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

4

Important Concepts

Why we had to move to multicore chips.

Prerequisites

Assume undergraduate computer architecture course such as CSE 420

Page 5: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

5

Text

Computer Architecture: A Quantitative Approach Fifth Edition, John L. Hennessy and David A. Patterson, 2012

Flip

Experiment… I want to try to partially “flip” this class:

– Reading is done before class. – Class has more discussion and exercises

than lecture.

Page 6: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

6

Grading

55% Exercises 40% Final Exam (Wed., Dec. 12, 7:45 - 9:45 AM) 05% Classroom Participation Course grade:

93% and above is a 4.0; 85% - 92% is a 3.5; 80% - 84% is a 3.0, etc.

Exercises

Somewhat experimental: End of the chapter (weekly?) In-class. Other…

Page 7: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

7

Schedule •  Week 1 Ch1 Fundamentals (1 class, first week) •  Week 2 Ch1 continued (1 class, MLK holiday) •  Week 3 Ch1 + Ch2 Memory Hierarchy •  Week 4 Ch2 continued •  Week 5 Ch3 Instruction-Level Parallelism •  Week 6 Ch3 + Ch4 Data-Level Parallelism •  Week 7 Ch4 continued •  Week 8 Ch5 Thread-Level Parallelism •  Week 9 Ch5 + Ch6 Warehouse-Scale Computing •  Week 10 Ch6 continued •  Week 11 Appendix E (online) Embedded Systems •  Week 12 Appendix D (online) Storage Systems •  Week 13 Appendix I (online) Large-Scale Multiprocessors •  Week 14 Appendix I continued •  Week 15 Other

Other? Possibilities

Virtualization support Newest Intel and AMD chips Google architecture Power, Thermal, Skew issues Asynchronous

Page 8: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

8

Use some of Hennessy & Patterson’s slides.

Lawrence Livermore IBM Sequoia

Page 9: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

9

Lawrence Livermore IBM Sequoia

•  Fastest computer in the world (FS12) •  LINPACK: 16.32 petaflops per second •  highly interconnected cluster of •  1,572,864 cores •  1.6 PetaBytes of RAM, 55 PB storage •  3 Gflops/watt •  mounted on 98,304 boards, in 96 racks •  across 318 m2 of floor space. •  768 I/O nodes running Linux

Dongarra •  "The No. 1 machine — I don't know the exact cost,

but I would guess it's on the order of $200 million US," he said.

•  "In the U.S., probably If you were to burn one megawatt for one year, that would cost $1 million US. So if our computer system has 10 MW, it would cost $10 million US to run that computer for one year. That's just the power," he said.

•  Sequoia is 7.9 MW so “only” $7.9 million US per year

Page 10: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

10

Next

•  TACC: 10 PetaFLOP Stampede •  Later: Exa-scale •  Power Goal

– 1000x performance improvement for – 10x power increase

Intel Aubrey Isle 32-core CPU

Page 11: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

11

Page 12: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

12

Page 13: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

13

Page 14: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

14

Page 15: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

15

Intel Xeon Phi •  50 cores •  22-nanometer transistors •  8 GB of GDDR5 memory •  1 Tflop •  co-processor •  Design elements inherited from the Larrabee:

x86 ISA, 512-bit SIMD units, coherent L2 cache, and ultra-wide ring bus connecting processors and memory.

new name for what was Knights Corner

Page 16: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

16

Why?

Intel’s response to GPGPU-based supercomputers running CUDA It is all about Flops per Watt

Meanwhile in your pocket … Motorola Atrix 4G NVIDIA Tegra2 processor (40nm) Dual-core ARM Cortex-A9 CPU Out-of-order processing 1080p HDTV - HDMI 1 GHz L2 cache 1 MB (shared?) L1 32KB I & 32KB D per core 1 GB memory 8-core GPU 12 MP camera with 16X zoom

Page 17: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

17

In your pocket

•  Galaxy S111 phone •  Samsung 1.4 GHz Exynos 4 Quad

processor

Our SIGGRAPH demo of the ARM Mali-T604 GPU gave a brief preview of Samsung's upcoming Exynos 5 Dual CPU, but now all the details of the company's next great processor are ready for us to view. Other than that GPU which includes support for up to WQXGA (2,560 x 1,600) resolutions -- perfect for the 11.8-inch P10 mentioned in court filings -- and much more, the white paper uncovered by Android Authority also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to 800MHz with a bandwidth of 12.8GBps, USB 3.0 and SATA III. It also claims the horsepower to decode 1080p video at 60fps in pretty much any codec, stereoscopic 3D plus handle graphics APIs like OpenGL ES 3.0 and OpenCL 1.1. All of this is comes courtesy of a dual-core 1.7GHz ARM Cortex-A15 CPU built on the company's 32nm High-K Metal Gate process and Panel Self Refresh technology that avoids changing pixels unnecessarily to reduce power consumption.

Page 18: Richard Enbody - Michigan State Universitycse820/lectures/01_introduction.pdf · also mentions support for features like Wi-Fi Display, high bandwidth LPDDR3 RAM running at up to

8/28/12

18

Algorithms A benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988. Fifteen years later (2003) it could be solved in roughly 1 minute, an improvement of roughly 43 million. Roughly 1,000 was due to increased processor speed; a factor of roughly 43,000 was due to improvements in algorithms!

Professor Martin Grötschel Konrad-Zuse-Zentrum für Informationstechnik Berlin.

What Computer Architecture brings to Table •  Other fields often borrow ideas from architecture •  Quantitative Principles of Design

1.  Take Advantage of Parallelism 2.  Principle of Locality 3.  Focus on the Common Case 4.  Amdahl’s Law 5.  The Processor Performance Equation

•  Careful, quantitative comparisons –  Define, quantity, and summarize relative performance –  Define and quantity relative cost –  Define and quantity dependability –  Define and quantity power

•  Culture of anticipating and exploiting advances in technology

•  Culture of well-defined interfaces that are carefully implemented and thoroughly checked