18
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov 2015

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

Embed Size (px)

Citation preview

Page 1: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Embedded Computer Architecture5SAI0

Simulation- chapter 9 -

Luc Waeijen16 Nov 2015

Page 2: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

How to study a computer system•Methodologies➢Construct a hardware prototype➢Mathematical modeling➢Simulation

Page 3: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Construct a hardware prototype•Advantages➢Runs fast•Disadvantages➢Takes long time to build-RPM (Rapid Prototyping engine for Multiprocessors) Project @ USC; took a few graduate students several years➢Expensive➢Not flexible

Page 4: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Mathematically model the system•Use analytical modeling➢Probabilistic➢Queuing➢Markov➢Petri Net•Advantages➢Very flexible➢Very quick to develop➢Runs quickly•Disadvantages➢Can not capture effects of system details➢Computer architects are skeptical of models

Page 5: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Simulation•Write a program that mimics system behavior•Advantages➢Very flexible➢Relatively quick to develop•Disadvantages➢Runs slowly (e.g., 30,000 times slower than hardware)

Page 6: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Most popular research method•Simulation is chosen by MOST research projects•Why?➢Mathematical model is NOT accurate➢Building prototype is too time-consuming and too expensive for academic researchers

Page 7: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Simulation Bottleneck•1 GHz = 1 Billion Cycles per Second•Simulating a second of a future machine execution = Simulate 1B cycles!!•Simulation of 1 cycle of a target = 30,000 cycles on a host•1 second of target simulation = 30,000 seconds on host = 8.3 Hours•CPU2K run for a few hours natively• Speed much worse when simulating CMP targets!!

7

Page 8: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Simulation Bottleneck•1 GHz = 1 Billion Cycles per Second•Simulating a second of a future machine execution = Simulate 1B cycles!!•Simulation of 1 cycle of a target = 30,000 cycles on a host•1 second of target simulation = 30,000 seconds on host = 8.3 Hours•CPU2K run for a few hours natively• Speed much worse when simulating CMP targets!!

8

Page 9: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

How to overcome simulation bottleneck

Gate level (RTL)

Cycle accurate

Functional level (ISA)

Detail Simulation speed

trade accuracy for simulation speed

Page 10: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

How to overcome simulation bottleneck

Gate level (RTL)

Cycle accurate

Functional level (ISA)

Model based approximation

Detail Simulation speed

trade accuracy for simulation speed

This trade-off has resulted ina plethora of simulators

Page 11: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov
Page 12: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Tool classification•OS code execution➢System-level (complete system)-Does simulate behavior of an entire computer system, including OS and user code-Examples:–Simics–SimOS➢User-level-Does NOT simulate OS code-Does emulate system calls-Examples:–SimpleScalar

Page 13: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Tool classification•Simulation detail➢Instruction set-Does simulate the function of instructions-Does NOT model detailed micro-architectural timing-Examples:–Simics

➢Micro-architecture-Does clock cycle level simulation-Does speculative, out-of-order multiprocessor timing simulation-May NOT implement functionality of full instruction set or any devices-Examples:–SimpleScalar

➢RTL-Does logic gate-level simulation-Examples:–Synopsis

Page 14: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Tool classification•Simulation input➢Trace-driven-Simulator reads a “trace” of inst captured during a previous execution by software/hardware-Easy to implement, no functional component needed-Large trace size; no branch prediction➢Execution-driven-Simulator “runs” the program, generating a trace on-the-fly-More difficult to implement, but has many advantages-Interpreter, direct-execution-Examples:–Simics, SimpleScalar…

Page 15: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Page 16: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Interval Simulation

Page 17: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Multi-Core Simulation•Sequential simulationAll target cores are simulated in one thread (on one host core)Unified memory hierarchy models simulate resource contention

•Parallel simulationEach target core is simulated in separate thread

Page 18: © Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Multi-Core Simulation•Sequential simulationAll target cores are simulated in one thread (on one host core)Unified memory hierarchy models simulate resource contention

•Parallel simulationEach target core is simulated in separate thread

There is no relation between the number of target cores and the cores on the host!

(except simulation speed)