20
CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

CSCE 432/832 High Performance

Processor ArchitecturesAn Introduction to CMP Simulators

By Dongyuan Zhan11/18/2009

Page 2: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 2

Outline

• An Overview of CMP Research Tools• A Detailed Introduction to SIMICS• A Detailed Introduction to GEMS• Other Online Resources

Page 3: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

An Overview of CMP Research Tools

• CMP Simulators– SESC (http://users.soe.ucsc.edu/~renau/rtools.html)

– M5 (http://www.m5sim.org/wiki/index.php/Main_Page)

– Simics (https://www.simics.net/)

– GPGPUSim (http://www.ece.ubc.ca/~aamodt/gpgpu-sim/)

• Benchmark Suites– Single-threaded Applications

» SPEC2000 (www.spec.org)

» SPEC2006

– Multi-threaded Applications

» SPECOMP2001

» SPECWeb2009

» SPLASH2 (http://www-flash.stanford.edu/apps/SPLASH/)

» Parsec (http://parsec.cs.princeton.edu/)

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 3

Page 4: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

An Overview of CMP Research Tools

• A Taxonomy of Simulation– Function vs. Timing

» Functional simulation: simulate the functionalities of a system

» Timing simulation: simulate the timing behavior of a system

– Full System vs. Non FS

» Full system simulation: like a VM that can boot up Oss

» Syscall emulation: no OS but syscalls are emulated by the simulator

– Simulation Stages

» Configuration stage: connect cores, caches, drams, interconnects and I/Os to build up a system

» Fast-forward stage: bypass the initialization stage of a benchmark program without timing simulation

» Warm-up stage: fill in the pipelines, branch predictors and caches by executing a certain number of instructions but do not count them in the performance statistics

» Simulation stage: detailed simulation to obtain performance statistics

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 4

Page 5: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

An Overview of CMP Research Tools

• The Commonly used CMP Simulators– SESC

» Only supports timing & syscall simulation

» Only supports MIPS ISA

» Able to seamlessly cooperate with Cacti (power), Hotspot (temperature) and Hotleakage (static power)

» Especially useful in power/thermal research

» Cacti is available at http://www.cs.utah.edu/~rajeev/cacti6/

» Hotspot: http://lava.cs.virginia.edu/HotSpot/

» Hotleakage: http://lava.cs.virginia.edu/HotLeakage/index.htm

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 5

Page 6: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

An Overview of CMP Research Tools• The Commonly used CMP Simulators

– SIMICS (Commercial but free-use for academia)

» Only supports functional & full-system simulation

» Supports multiple ISAs• SparcV9 (well supported by public-domain add-on modules)

• X86, Alpha, MIPS, ARM (seldom supported by 3rd-party modules)

» Needs add-on models to do performance & power simulation

• GEMS (http://www.cs.wisc.edu/gems/)

– it has two components for performance simulation:

OPAL: an out-of-order processing core model

RUBY: a detailed CMP mem hierarchy model

• Simflex (http://parsa.epfl.ch/simflex/)

– It is similar to GEMS in functionality

– It supports statistical sampling for simulation

• Garnet (http://www.princeton.edu/~niketa/garnet.html)

– It supports the performance and power simulation for NoC

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 6

Page 7: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

An Overview of CMP Research Tools

• The Commonly used CMP Simulators– M5

» Supports both functional and timing simulation

» Has two simulation modes: full-system (FS) and syscall emulation (SE)

» Supports multiple ISAs• ALPHA: well-developed to support both FS and SE modes

• SPARC: only models a UNI-CORE UltraSPARC T1 processor

• X86/MIPS/ALPHA: in progress

» It models • Processor Cores

• Memory Hierarchy

• I/O Systems

» Written by using C++, Python & Swig, and totally open-source

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 7

Page 8: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to Simics

• Directory Tree Organization– Under the root directory of Simics

» licenses: licenses for functional simics

» doc: detailed documents about all aspects

» targets: simics scripts that describe specific computer systems

» src: simics header files for user programming

» amd64-linux: dynamic modules “*.so” that are invoked by Simics to build up modeled computer systems

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 8

Page 9: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to Simics

• Key Features of Simics– Simics can be regarded as a command interpreter

» Command Line Interface (CLI): let users to control Simics

– Simics is quite modular

» It uses Simics scripts to connect different FUNCTIONAL modules (e.g., ISA, dram, disk, Ethernet), which are compiled as “lib/*.so” files, to build up a system.

» The information of all pre-compiled modules can be found in “doc/simics-reference-manual-public-all.pdf”.

» Modules can be designed in C/C++, python, and DML.

– Simics has already implemented several specific target systems (defined in scripts) for booting up an operating system

» E.g., SUN’s Serengeti system with Ultrasparc-III processors, which is scripted in the directory “targets/serengeti”

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 9

Page 10: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to Simics

• Key Features of Simics– DML, MAIs, APIs and CMDs

» DML: the Simics-specific Device Modeling Language, a C-like programming language for writing device models for Simics using Transaction Level Modeling. DML is simpler than C/C++ and python in device modeling.

» MAI:

• the Simics-specific Micro-Architectural Interface, enables users to define when things happen while letting Simics to handle how things happen.

• the add-on GEMS and SIMFLEX both use this feature to implement timing simulation.

» APIs: a set of functions that provide access to Simics functionality from script languages in the frontend and from extensions, usually written in C/C++.

» CMDs: the Simics-specific commands used in CLI to let users to control Simics, such as loading modules or running python scripts.

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 10

Page 11: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to Simics

• Using Simics– Installing Simics

» See “simics-installation-guide-unix.pdf”

– Creating Workspace

» See Chapter 4 of “doc/simics-user-guide-unix.pdf”

– Installing a Solaris OS

» Change the disk capacity by modifying the cylinder-head-sector parameters in “targets/serengeti/abisko-sol*-cd-install1.simics”.

» E.g., a 32GB=40980*20*80*512B disk is created by the command

($scsi_disk.get-component-object sd).create-sun-vtoc-header -quiet 40980 20 80

» Enter the workspace just created

» See Chapter 6 of “doc/simics-target-guide-serengeti.pdf”

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 11

Page 12: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to Simics

• Using Simics– Modify the Simics script (for describing the Serengeti system) to

enable multiple cores

» Change $num_cpus in “targets/serengeti/serengeti-6800-system.include”

– Booting the Solaris OS in Simics

» Under the workspace directory just created, enter the subdirectory “home/serengeti”

» Type “./simcs abisko-common.simics”

» Type “continue”

– Install the SimicsFS (used to communicate with your host system)

» See Section 7.3 of “doc/simics-user-guide-unix.pdf”

– Save a breakpoint, exit and restart from the previous breakpoint

» Type “Write-configuration try.conf“

» Type “exit”

» Type “./simics –c try.conf”

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 12

Page 13: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to GEMS

• An Overview of GEMS

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 13

Detailed

Processor

Model

OpalSimics

Microbenchmarks

Random

Tester

De

term

inis

tic

Co

nte

nd

ed

lo

ck

s

Tra

ce

fli

e

Page 14: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to GEMS

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 14

P0

Simics time queue

P1 P2 P3

stall()/unstall()

stall()/unstall()

stall()/unstall()

stall()/unstall()

instructions

Simics

in-order processor model

SIMICS

RubyMemory System Model

Page 15: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to GEMS

• Essential Components in Ruby– Caches & Memory

– Coherence Protocols

» CMP protocols

• MOESI_CMP_token: M-CMP token coherence

• MSI_MOSI_CMP_directory: 2-level Directory

• MOESI_CMP_directory: higher performing 2-level Directory

» SMP protocols

• MOSI_SMP_bcast: snooping on ordered interconnect

• MOSI_SMP_directory

• MOSI_SMP_hammer: based on AMD Hammer

» User defined protocols using GEMS SLICC

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 15

Page 16: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to GEMS

• Essential Components in Ruby– Interconnection Networks

» Either be automatically generated by default

– Intra-chip network: Single on-chip switch

– Inter-chip network: 4 included (next slide)

» Or be customized by users

– Defined in *_FILE_SPECIFIED.txt under the directory “$GEMS_ROOT_DIR/ruby/network/simple/Network_Files”

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 16

Page 17: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

Auto-generated Inter-chip Network Topologies

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 17Slide 17

TopologyType_TORUS_2D

TopologyType_CROSSBAR

TopologyType_HIERARCHICAL_SWITCH

TopologyType_PT_TO_PT

Page 18: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

Topology Parameters

– Link latency

» Auto-generated• ON_CHIP_LINK_LATENCY

• NETWORK_LINK_LATENCY

» Customized• ‘link_latency:’

– Link bandwidth

» Auto-generated• On-chip = 10 x g_endpoint_bandwidth

• Off-chip = g_endpoint_bandwidth

» Customized• Individual link bandwidth = ‘bw_multiplier:’ x g_endpoint_bandwidth

– Buffer size

» Infinite by default

» Customized network supports finite buffering• Prevent 2D-mesh network deadlock through e-cube restrictive routing

• ‘link_weight’

– Perfect switch bandwidth

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 18

Page 19: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

A Detailed Introduction to GEMS

• Steps of Using GEMS:– Choosing a Ruby protocol

– Building Ruby and Opal

– Starting and configuring Simics

– Loading and configuring Ruby

– Loading and configuring Opal

– Running simulation

– Getting results

– I am going to show an example next.

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 19

Page 20: CSCE 432/832 High Performance Processor Architectures An Introduction to CMP Simulators By Dongyuan Zhan 11/18/2009

Other Online Resources

• Simics Online Forum– https://www.simics.net/

• GEMS Mailing List & Archive– http://lists.cs.wisc.edu/mailman/listinfo/gems-users

• A Student wrote some articles about installing and using Simics at

– http://fisherduyu.blogspot.com/

• A Note by me– http://docs.google.com/View?id=dc3xsqzx_131fcdxhnhg

04/19/23 CSCE 432/832, An Introduction to CMP Simulators 20