CSE 820 Graduate Computer Architecture Richard Enbody

  • View
    218

  • Download
    2

Embed Size (px)

Text of CSE 820 Graduate Computer Architecture Richard Enbody

  • CSE 820Graduate Computer ArchitectureRichard Enbody

  • Dr. EnbodyBorn and raised in NH and MEFormer High School Math TeacherAt MSU since 1987ResearchComputer SecurityComputer ArchitectureHockey and squash player

  • ObjectivesIn this course students will study advanced concepts in computer architecture. The emphasis is on modern processor design, but will include some multiprocessor design. Roughly half the time will be spent with material related to the textbook; the remainder will be material not in the text. Research papers will be assigned to be read and analyzed. Homework and/or project will involve some programming.

  • PrerequisitesAssume undergraduate computer architecture course such as CSE 420

  • Grading30% Homework & Project30% Midterm Exam (tentative: Oct. 19 in class)35% Final Exam (Dec 12, 7:45 - 9:45 AM)05% Classroom ParticipationCourse grade: 93% and above is a 4.0; 85% - 92% is a 3.5; 80% - 84% is a 3.0, etc.

  • ScheduleFirst half: textMidtermSecond half: cool architecture stuffFinalIn-between: readings, writings, programming, term paper

  • Cool Stuff?PossibilitiesVirtualization supportIBM Cell processorMulti-coresGoogle architecturePower, Thermal, Skew issuesAsynchronousGraphic processing

  • Homework + ProjectOne-page overview of assigned readingMulticore programmingTerm paper: graphics cards as compute engines

  • ChoiceCover old textOr try to do the new one

  • Use Pattersons slides (author)

  • EECS 252 Graduate Computer Architecture Lec 1 - Introduction

    David PattersonElectrical Engineering and Computer SciencesUniversity of California, Berkeley

  • OutlineComputer Science at a CrossroadsComputer Arch. vs. Instruction Set Arch.What Computer Architecture brings to table

  • Crossroads: Conventional Wisdom in Comp. ArchOld Conventional Wisdom: Power is free, Transistors expensiveNew Conventional Wisdom: Power wall Power expensive, Xtors free (Can put more on chip than can afford to turn on)Old CW: Sufficiently increasing Instruction Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, )New CW: ILP wall law of diminishing returns on more HW for ILP Old CW: Multiplies are slow, Memory access is fastNew CW: Memory wall Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply)Old CW: Uniprocessor performance 2X / 1.5 yrsNew CW: Power Wall + ILP Wall + Memory Wall = Brick WallUniprocessor performance now 2X / 5(?) yrs Sea change in chip design: multiple cores (2X processors per chip / ~ 2 years)More simpler processors are more power efficient

  • Crossroads: Uniprocessor Performance VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006

    Chart3

    1

    1.5

    5

    9

    13

    18

    24

    51

    80

    117

    183

    280

    481

    648.9682539683

    992.5396825397

    1267.3968253968

    1778.9365079365

    2584.4206349206

    4195.3888888889

    5363.5317460318

    5764.3650793651

    6504.9523809524

    25%/year

    52%/year

    ??%/year

    Performance (vs. VAX-11/780)

    Sheet1

    Data points (not linked)yearPerf TargetCAGR

    YearRel. Perf.Computer1978125%

    19781VAX-11/780198656.052%VAX-11/780

    19841.5VAX-11/785200242004,059.520%VAX-11/785

    19865VAX 8700200565007,257.6VAX 8700

    19879Sun-4/260Sun-4/260

    198813MIPS M/120MIPS M/120

    198918MIPS M2000MIPS M2000

    199024IBM RS6000/540IBM RS6000/540

    199151HP 9000/750HP PA-RISC, 0.05 GHz

    199280Digital 3000 AXP/500Alpha 21064, 0.2 GHz

    1993117IBM POWERstation 100PowerPC 604, 0.1 GHz

    1994183Digital Alphastation 4/266Alpha 21064A, 0.3 GHz

    1995280Digital Alphastation 5/300Alpha 21164, 0.3 GHz

    1996481Digital Alphastation 5/500Alpha 21164, 0.5 GHz

    1997649AlphaServer 4000 5/600, 600 MHz 21164(was 1140 for 600 MHz 21264)Alpha 21164, 0.6 GHz

    1998993Digital AlphaServer 8400 6/575, 575 MHz 21264Alpha 21264, 0.6 GHz

    19991,267Professional Workstation XP1000, 667 MHz 21264AAlpha 21264A, 0.7 GHz

    20001,779Intel VC820 motherboard, 1.0 GHz Pentium III processorIntel Pentium III, 1.0 GHz

    20012,584AMD Epox 8KHA+ Motherboard, AMD Athlon (TM) XP 1900+, 1.6 GHzAMD Athlon, 1.6 GHz

    20024,195Intel D850EMVR motherboard (3.06 GHz, Pentium 4 processor with Hyper-Threading Technology)Intel Pentium 4, 3.0 GHz

    20035,364AMD ASUS SK8N Motherboard, AMD Opteron (TM) 148, 2.2 GHzAMD Opteron, 2.2 GHz

    20045,764HP ProLiant BL20p G3 (3.6GHz, Intel Xeon)Intel Xeon, 3.6 GHz

    20056,505Fujitsu Siemens Computers, PRIMERGY BX620 S2, 64-bit Intel Xeon 3.60 GHz64-bit Intel Xeon, 3.6 GHz

    Sheet1

    1

    1.5

    5

    9

    13

    18

    24

    51

    80

    117

    183

    280

    481

    648.9682539683

    992.5396825397

    1267.3968253968

    1778.9365079365

    2584.4206349206

    4195.3888888889

    5363.5317460318

    5764.3650793651

    6504.9523809524

    25%/year

    52%/year

    20%/year

    Performance (vs. VAX-11/780)

    Sheet2

    Sheet3

  • Sea Change in Chip DesignIntel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm2 chip Processor is the new transistor? RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip125 mm2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+DcacheRISC II shrinks to ~ 0.02 mm2 at 65 nmCaches via DRAM or 1 transistor SRAM (www.tram.com) ?Proximity Communication via capacitive coupling > 1 TB/s? (Ivan Sutherland @ Sun / Berkeley)

  • Dj vu all over again?Multiprocessors imminent in 1970s, 80s, 90s, todays processors are nearing an impasse as technologies approach the speed of light..David Mitchell, The Transputer: The Time Is Now (1989)Transputer was premature Custom multiprocessors strove to lead uniprocessors Procrastination rewarded: 2X seq. perf. / 1.5 years We are dedicating all of our future product development to multicore designs. This is a sea change in computingPaul Otellini, President, Intel (2004) Difference is all microprocessor companies switch to multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2 CPUs) Procrastination penalized: 2X sequential perf. / 5 yrs Biggest programming challenge: 1 to 2 CPUs

  • Problems with Sea Change Algorithms, Programming Languages, Compilers, Operating Systems, Architectures, Libraries, not ready to supply Thread Level Parallelism or Data Level Parallelism for 1000 CPUs / chip, Architectures not ready for 1000 CPUs / chipUnlike Instruction Level Parallelism, cannot be solved by just by computer architects and compiler writers alone, but also cannot be solved without participation of computer architectsThis edition of CS 252 (and 4th Edition of textbook Computer Architecture: A Quantitative Approach) explores shift from Instruction Level Parallelism to Thread Level Parallelism / Data Level Parallelism

  • OutlineComputer Science at a CrossroadsComputer Arch. vs. Instruction Set Arch.What Computer Architecture brings to table

  • Instruction Set Architecture: Critical InterfaceProperties of a good abstractionLasts through many generations (portability)Used in many different ways (generality)Provides convenient functionality to higher levelsPermits an efficient implementation at lower levelsinstruction setsoftwarehardware

  • Example: MIPS0r0r1r31PClohiProgrammable storage2^32 x bytes31 x 32-bit GPRs (R0=0)32 x 32-bit FP regs (paired DP)HI, LO, PCData types ?Format ?Addressing Modes?Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUISLL, SRL, SRA, SLLV, SRLV, SRAVMemory AccessLB, LBU, LH, LHU, LW, LWL,LWRSB, SH, SW, SWL, SWRControlJ, JAL, JR, JALRBEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL32-bit instructions on word boundary

  • Instruction Set Architecture... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. Amdahl, Blaauw, and Brooks, 1964-- Organization of Programmable Storage

    -- Data Types & Data Structures: Encodings & Representations

    -- Instruction Formats

    -- Instruction (or Operation Code) Set

    -- Modes of Addressing and Accessing Data Items and Instructions

    -- Exceptional Conditions

  • ISA vs. Computer ArchitectureOld definition of computer architecture = instruction set design Other aspects of computer design called implementation Insinuates implementation is uninteresting or less challengingOur view is computer architecture >> ISAArchitects job much more than instruction set design; technical hurdles today more challenging than those in instruction set designSince instruction set design not where action is, some conclude computer architecture (using old definition) is not where action isWe disagree on conclusionAgree that ISA not where action is (ISA in CA:AQA 4/e appendix)

  • Comp. Arch. is an Integrated ApproachWhat really matters is the functioning of the complete system hardware, runtime system, compiler, operating system, and applicationIn networking, this is called the End to End argumentComputer architecture is not just about transistors, individual instructions, or particular implementationsE.g., Original RISC projects replaced complex instructions with a compiler + simple instructions

  • Computer Architecture is Design and AnalysisArchitecture is an iterative process: Sea