74
IBM Research ISPASS Workshop, March 8, 2003 © 2003 IBM Corporation Mambo: Advances in PowerPC System Simulation IBM’s Full System Simulation Solution Patrick Bohrer, James Peterson, Hazim Shafi {pbohrer,petersjl,hshafi}@us.ibm.com IBM Austin Research Laboratory

Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

Embed Size (px)

Citation preview

Page 1: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

IBM Research

ISPASS Workshop, March 8, 2003 © 2003 IBM Corporation

Mambo: Advances in PowerPC System Simulation

IBM’s Full System Simulation SolutionPatrick Bohrer, James Peterson, Hazim Shafi{pbohrer,petersjl,hshafi}@us.ibm.comIBM Austin Research Laboratory

Page 2: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

2

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Overview� What is Mambo ?� Mambo Internals� Mambo Demonstrations/Results� Challenges� Conclusion

Page 3: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

3

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

What is Mambo ?�A complete system simulator for PowerPC

systems– Embedded (405, 440, 750)– Server (64-bit Apache and others)– Game (Cell/STI)– Supercomputer (BlueGene/L)

�Modular, configurable infrastructure– Basic: Processor(s), memory, caches– I/O: Disk, Ethernet, buses

Page 4: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

4

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

What is Mambo ? (Continued)� Features

– Accurate performance & power modeling (405)– Models complex SMP effects (64-bit)– I/O interactions– Plugs into IBM’s modeling infrastructure– Easy-to-use GUI or command-line interface

� Development environment– Pre-hardware software development & tuning– Provides more visibility into existing hardware systems– Alternative to real hardware– Debugging and development of system software (OS and Hypervisor)– Hardware verification

Page 5: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

5

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo History� PASTA

– IBM architectural simulator used to evaluate PowerPC compiler output and the VMX extensions

� SimOS-PPC– Stanford infrastructure used originally to bootstrap effort within

ARL� Cell/STI and BlueGene/L

– Need for IBM proprietary tool� SimOS-PPC � Mambo

– IBM owned PowerPC extensions but …– Lost all code that was written by or evolved from Stanford code

Page 6: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

6

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Stack

Mambo

Linux/PPC

PowerPC“Simple”

ISA Simulator Cache SimL1 & L2

Memory CntrlMemory Sim

ROM

Applications 32/64-bit

Analysis Tools

•Trace•Profile•Etc

TestPrograms

(TSTs, etc)

Visual-izationToolsUART

Timers“Tempo”

Cycle Accurate ISA Simulator ENET

Int Ctrlr

PowerPC

AIX 4.3.x

Intel x86

MacOS-X Linux 2.4.X

Page 7: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

7

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Goals� Fast simulation

– Simulation speed should be a factor of the details desired by the user (Emitter)

� Easily to modify and enhance– Add new instructions within an hour– Add new collectors within a day– Add new devices within a week

� One code base– All projects feed into one CVS repository – All of IBM can benefit from available simulation models

Page 8: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

8

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Value : Simulation Repository� Configuration Options:

–VMX extensions–Hypervisor–32-bit Desktop–32-bit Embedded–64-bit Server–OpenPIC interrupt controller–Universal interrupt controller (UIC)–PCI/IDE–HW and SW managed TLBs–HW and SW managed SLBs–UARTs–New Features*

� We continue to add– Devices – PowerPC extensions– Cycle-accurate models

DevicesProcessors

– Power models– Visualization Tools

Page 9: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

9

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Uses� Principal software bring-up platform for some IBM Internal Projects

– Pre-hardware software development– Architecture definition and verification– Detailed profiles and statistics associated back to software stack

� Trace and testcase generation tool for hardware models– Processor traces, memory traces, etc.– Generate TSTs which exhibit OS behavior to be run on hardware models

� System software development & debugging– Linux and Hypervisor development platform

� Research tool– Basic research in energy modeling (DARPA PAC/C)– BlueGene/L (BGLsim)– DARPA HPCS PERCS– Model of future architecture extensions

Page 10: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

10

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Collaboration� IBM development � IBM research labs� Academia

– PAC/C 1 & 2 (Univ. of Pittsburgh, Vanderbilt)– TRIPS (Univ. of Texas)– K42 (Univ. of Toronto, CMU, Univ. of Rochester, Univ. of Texas)

Page 11: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

11

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Licensing� Binary licenses are granted to 3rd parties which are

collaborating with IBM on various projects� We assist other IBM teams in getting binary licenses

setup if they are willing to support the 3rd party

Page 12: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

12

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo

Mambo Runtime Environment

tcl/tk/blt/mambo cmds TCL/Tk/BLTGUI Scripts

4

1

Startup TCL File (.gmambo.tcl)# Create simulator instancesim apache mysim

# Load boot imagemysim load elf bootImage

# Source the GUI scriptssource $env(EXEC_DIR)/../bin/lib/common/default_gui.tcl

unix $ ../run_guiGUI EnabledLicensed Materials – Property of IBM.© Copyright IBM Corporation 2001, 2002All Rights Reserved%

5KernelBoot Image

RAMDisk

(Optional)

3

Disk Image

NetworkServiceDaemon

mysim (apache model)

consoledisk modelnet model

cpu modelmemory

2

ROMROMImage

(rom.bin)

Page 13: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

13

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

What do you run on Mambo ?� Load firmware into ROM and boot from ROM

– ROM may load operating system from a simulated device and boot it or the operating system resides in the ROM

� Load operating system into memory and boot– Mambo’s ELF loader will initialize memory and the processor– Mambo will catch ROM calls (OpenFirmware) and emulate it

� Load stand-alone applications into memory and run– Mambo’s ELF loader will initialize memory and the processor– TCL command will enable mode in Mambo where all system

calls (sc instructions) are caught and handled by simulator� TST or AVP testcases

– Convert testcases into self-testing TCL startup files

Page 14: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

14

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Interface OverviewMambo

Tk & BLT

Mambo Specific Commands

TCL Language

Simulation Engine

% sim ppc405gp foo% foo cpu 0 display gpr 20x0000000000000000% while { [foo cpu 0 display spr pc] != 0xC00 }{ foo cycle 1 }

Graphical Users Interface (GUI) Script Library

Emitter

Shared Memory:CircularBuffer

ofEvents

Strip Chart Generator

Software profiler

Qtrace generator

Memory Tracer

Collectors:

Trace File

Page 15: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

15

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Value (Visualization)L1 Miss Rates 999Mhz 1GB/768MB Heap Java MTRT MP Kernel

0

0.5

1

1.5

2

2.5

3

3.5

0.00E+00 2.00E+09 4.00E+09 6.00E+09 8.00E+09 1.00E+10 1.20E+10Processor Cycles

Perc

enta

ge M

isse

s/In

stru

ctio

ns

Ins L1 Miss Rate Data L1 Miss Rate

•Generate a signature of workload and then run inhigher fidelity (slower) modeat key segments.•Correlate events (like cache misses) back to source and assembly code.

Page 16: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

16

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Supporting Efforts� Operating Systems

– Device Drivers– Linux PPC 32-bit– Linux PPC 64-bit– SSX

� Other Utilities– RAM disk setup– Network daemon

� Cross-development Tools– Compilers– Libraries– Binutils– GDB

Target: PPC64 Linux Host: AIX, Linux/x86

Target: PPC32 LinuxHost: AIX, Linux/x86

Page 17: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

17

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Internals

Page 18: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

18

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Internals� Run on several Hosts

• Linux, AIX, MacOS-X• PowerPC, x86• Host-endian, Big-endian, Little-endian

• All written in C• Considered C++• gcc, xlc, g++ • -Wall• Build with no errors, no warnings• Consistent, portable C

Page 19: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

19

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo

Tk & BLT

Mambo Specific Commands

TCL Language

Simulation Engine

Graphical Users Interface (GUI) Script Library

User Commands� TCL for commands

� Tk/BLT for GUI

� TCL Commands– TCL scripts, for regression– C command interpreters– Command tables

% sim ppc405gp foo% foo cpu 0 display gpr 20x0000000000000000% while { [foo cpu 0 display spr pc] != 0xC00 } {foo cycle 1 }

Page 20: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

20

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Command Tablesmysim trigger set pc 0x4032 “breakpoint mysim”

config

display

go

step

load

trigger

clear

set

display

mambo

cycle

assoc

pcConsistent Command Processing

Page 21: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

21

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

(Simulated) Machine creation

� sim apache mysim� sim ppc405 mysim� sim ppc750 mysim switch

build apache build 405 build 750

consoleROMPCI

UART

emac

RTCPCI

UARTRTCPCI

ROM

tcl

Page 22: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

22

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Main Data Structures

Generic Machine

Thread[ ]

Processor[ ]

Memory

Machine Specific

Memory Mapped Devices ROMPCIRTC…

011010110101010010

Machine StateProcessor State

Thread State

Page 23: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

23

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Memory Access Functions� EA_Read_Memory

Translate addressPA_Read_Memory

PA_Read_Cache_Memory

PA_Read_Memory_Bus

ROMPCIRTC…

Memory Memory Mapped Devices

011010110101010010

Page 24: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

24

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Event System� Event

– Time of event– Function to Execute– Opaque Data Pointer

� Timers� I/O Interrupts

– Disk– Console/Keyboard

� Checkpoints� User Interaction� Instruction Execution

Page 25: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

25

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Instruction Execution

1. Translate Instruction Address

2. Fetch Instruction

3. Decode Instruction

4. Execute Instruction

5. Check for Interrupts/ExceptionsReset PC, MSR

Event

New Event

Page 26: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

26

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Speed� On 1.2 GHz AMD Athlon Processor

� Simple: 2400 to 3300 Kilo-Instructions Per Second

� Time is spent:

– Simple instruction execution 18 to 20 %– Address Translation 18 to 24 %– Add Event + Process Event 7 % + 4 %

Page 27: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

27

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Development� CVS – Source Repository

– Check out– Develop– Build– Regress– Check in

� Regression– Development

Run 4 platforms (Boot Linux, run ls, exit)Takes about 2m15s

– NightlyRun all platforms (Boot OS, optionally run apps, exit)Adding architectural tests and machine specific test bucketsTakes about 3h22m

Page 28: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

28

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Configuration�Highly Configurable� config.h : #define of CONFIG options�Sizes of structures – TLBs, caches, ERATs, …� 32-bit or 64-bit�Hypervisor mode, threads�VMX, floating point, …

Page 29: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

29

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Emitter� With or without emitter code

� Emit macro calls thru out the code

MamboWith Emit

Shared Memory:CircularBuffer

ofEvents

Strip Chart Generator

Software profiler

Qtrace generator

Memory Tracer

Collectors:

Trace File

Page 30: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

30

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Emitter RecordsReader StartedREADER: Shared Memory Key is 76952READER: Writer Data found at 700000000000000READER: Reader Data found at 700000020000000READER: Reader Waiting for Header in Emitter DataREADER: Reader Found the Header in Emitter Data255: #1 CONFIG: CPUS:2 MEM_SIZE:100663296 CLOCK_FREQ:200 DISK_LAT:10 CPI:3.000000 L1D_BLK_SIZE:128 L1I_BLK_SIZE:320: #2 MSR: 01: #3 MSR: 0255: #4 IDLE_RANGE: 0x25270 <-> 0x2533c255: #5 PID_CREATE: PID:0 FLAG:512255: #6 PID_CREATE: PID:1 FLAG:0255: #7 EXEC_LOAD: PID:1 PATHNAME:'init' TXT_EA:0x10000100 TXT_VSID:0x60003018 TXT_LEN:34204 DATA_EA:0x20000398 DATA_VSID:0x6000140A DATA_LEN:4668255: #8 EXEC_LOAD: PID:1 PATHNAME:'/usr/lib/libcrypt.a[shr.o]' TXT_EA:0xD01980F8 TXT_VSID:0x60002814 TXT_LEN:2170 DATA_EA:0xF0001528 DATA_VSID:0x60003219 DATA_LEN:316…

0: #84 CYCLE:0 INST: EA:10000468 VSID:2150000 PA:1A67468 INST=906100400: #85 MemWrite: EA:2FF22CC0 VSID:2351000 PA:1BA3CC0 LENGTH:4 VALUE=01: #86 CYCLE:0 INST: EA:11E954 VSID:0 PA:11E954 INST=F8AF00F01: #87 MemWrite: EA:CBA70 VSID:0 PA:CBA70 LENGTH:8 VALUE=31: #88 CYCLE:3 INST: EA:11E958 VSID:0 PA:11E958 INST=F8CF00F81: #89 MemWrite: EA:CBA78 VSID:0 PA:CBA78 LENGTH:8 VALUE=3936280: #90 CYCLE:3 INST: EA:1000046C VSID:2150000 PA:1A6746C INST=480000040: #91 CYCLE:6 INST: EA:10000470 VSID:2150000 PA:1A67470 INST=800100580: #92 MemRead: EA:2ff22cd8 PA:1ba3cd8 LENGTH:4 VALUE=100003d81: #93 CYCLE:6 INST: EA:11E95C VSID:0 PA:11E95C INST=F8EF01001: #94 MemWrite: EA:CBA80 VSID:0 PA:CBA80 LENGTH:8 VALUE=01: #95 CYCLE:9 INST: EA:11E960 VSID:0 PA:11E960 INST=F90F01081: #96 MemWrite: EA:CBA88 VSID:0 PA:CBA88 LENGTH:8 VALUE=DEADBEEF0: #97 CYCLE:9 INST: EA:10000474 VSID:2150000 PA:1A67474 INST=7C0803A60: #98 CYCLE:12 INST: EA:10000478 VSID:2150000 PA:1A67478 INST=38210050…

255: #3304188 FOOTERREADER: Emitter #0 exitingReader Terminating

CPU Events

OS Event

Sim Event

Page 31: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

31

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Emitter (cont.)� Any number of collectors can

be attached to a single instance of mambo-emit

� By combining the Qtracer and the Stripchart collector, we can generate traces which also provide an overview of activity across the trace

Page 32: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

32

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Disk� PCI

– IDE– Standard device drivers– Disk Drives – DiskSim 2.0 for timing

– File for storage– Copy on Write

� “Bogus” Disks– Special Linux disk driver– Call-thru to Mambo– Immediate I/O– Copy on Write

Page 33: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

33

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Devices

Mambo

Memory Map

PCI

ROM

UART

IDE

Linux/PPC

Call Through Support

Bogus Disk

Bogus Net

Dump Stats

Bogus Disk Driver PCI/IDE Drivers

Store to address 0xFFE00000Special Illegal Inst gpr3=126

Disk Image

Page 34: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

34

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Debugging� Debug prints for modules

� Selectable at run time– TCL command

� Completely repeatable– To the cycle– Debugging– Regression depends on repeatability

Page 35: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

35

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Debugging with GDB

Mambo

Linux/K42

Application

Simulated PowerPC

Socket

GNUDebugger

Page 36: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

36

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Uses : Software Profiling

Page 37: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

37

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Software Profiling

Page 38: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

38

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Software Profiling (Continued)

Page 39: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

39

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Software Profiling (Continued)

Page 40: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

40

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Software Profiling (Continued)

Page 41: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

41

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Mambo Uses : Performance and Power Modeling

Page 42: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

42

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Performance Simulation: Tempo� Satisfy a need for performance/architecture research� Requires different methodology

– Event-based for speed, concurrency modeling, and timing accuracy

– Partition system simulator at functional block boundaries� Leverage Mambo functional infrastructure

– Add timing-related modifications to existing units– Mix functional/cycle-accurate modules– Tradeoff accuracy for speed

� Code reuse – Leverage existing functional models when possible

Page 43: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

43

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Performance/Power Outline� Structure:

– Instruction semantics– Pipeline models– Functional units– Exceptions/Interrupts– Instruction/data memory paths and address translation

� PPC405GP model� Timing validation� Event-based power model� Power validation� Work in progress:

– Cache-coherent MP support– PPC750

Page 44: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

44

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Simple vs. Tempo

Snapshot of Event Interleavings in Tempo: • Move I1 from DCD to EXE • Fetch next instruction • Move I2 from fetch buffer to DCD • Address translation for I4 • Move I3 from EXE to WB • Predict branch in prefetch buffer • Access Dcache for I4 • change next fetch PC • flush instructions after branch • Access Dcache for ....

Simple Main Loop: while (1) { FetchInstruction; DecodeInstruction; ExecuteSemantics; Set next PC; Interrupts? Exceptions? }

T=0

T=1

Page 45: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

45

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

PowerPC Instruction Semantics� Simple (functional):

– One function per instruction (or instruction class)� Tempo (cycle-accurate):

– Structural and data dependencesBusy bits for architectural registersSupport for register renamingSplit semantics according to major pipeline stagesFunctional units (more later)

– Multiple functions: Issue, writeback– Synchronizing instructions– Interrupts and exception timing/synch. effects (more

later)

Page 46: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

46

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Functional Units (Execution Stage)� Multiple types

– Load/Store, FXU, FPU etc.– Associate type(s) with every instruction– Used at issue stage

� Repeat rate (degree of pipelining)– Instruction dependent

� Latency– Per instruction

� Added to instruction decode table

Page 47: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

47

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Pipeline Model� Implementation specific

– Similarities possible within an architecture family� Granularity

– Tradeoff speed/code complexity– One or multiple events?– Architecture plays an important role in decision

� Interrupts/Exceptions� Synchronizing instructions� Branch prediction

Page 48: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

48

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Exceptions and Interrupts� Follow architecture definitions� Precise exceptions

– Implications on pipeline simulation– Instruction fetch effects

� Interrupt controller– Handle according to architecture semantics– Flush pipeline (wait for in-flight instructions if required)– Instruction fetch effects

� Synchronizing instructions– Share some of these restrictions– Not necessary for Simple

Page 49: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

49

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Instruction Fetch� Not an issue for Simple

– In order, no prefetching, and no branch prediction � Implementation-specific instruction fetch unit

– Fetch algorithm– Interaction with pipeline– Translation faults on prefetches, cache access, etc.

� Branch predictor– Not needed in Simple

Page 50: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

50

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Address Translation� Timing cost of address translation� Miss latency (hardware walks, exception handlers etc.)� Single/Multi-level translation (caching effects)

– SLB, TLB, ERAT, etc. – Simple uses ERATs for simulation speed

� Some instruction semantics affected – Speculative loads/stores

� Interface to cache/memory hierarchy� Exception generation

Page 51: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

51

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Memory Simulation� Loads/stores generate translation events� Successful translations generate cache accesses� Detailed cache models� Store buffers

� Aspects that may or may not be modeled in detail– Bus models– Memory models– Cache-coherence

� Details needed for out-of-order cores and MP simulation, but not necessary for in-order cores (more later)

Page 52: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

52

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Tempo PPC405GP Model

Page 53: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

53

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Tempo PPC405GP Model (cont.)

PFB1 PFB0 DCD EXE

Instruction Fetch Unit Branch Predictor

ITLB DTLB

UTLB

I Cache D Cache

Functional Units

WB

Register File

Page 54: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

54

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

PPC405GP Timing Details� No performance counters in hardware!� User’s Manual

– Pipeline structure– Branch prediction– Instruction latencies and repeat rates– TLB structure and miss latencies

� Challenges– Incomplete documentation (e.g., sensitivity to core/bus speeds)

� Pecan Board– PPC405GP-based evaluation board– SSX kernel – Small microbenchmarks to reverse engineer some cases

Page 55: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

55

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Example Test Case (load/use bubble)

loop:

lwz r2,0(r3)

addi r2,r2,1

lwz r2,0(r4)

addi r2,r2,1

lwz r2,0(r5)

addi r2,r2,1

bctr loop

R3, R4, R5 initially point to same address

(all hit after initial miss)

Adds are dependent

Page 56: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

56

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Example Test Case (load miss)

loop:

lwz r2,0(r3)

addi r2,r2,1

lwz r2,0(r4)

addi r2,r2,1

lwz r2,0(r5)

addi r2,r2,1

bctr loop

R3, R4, R5 access different lines that map to same set in 2-way D cache

(all miss)

Adds are dependent

Page 57: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

57

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Timing Validation� Compare simulated time with hardware� Operating system: SSX Kernel� Hardware: PPC405GP-based Pecan board� Applications:

– EEMBC v. 1.0– 42 program/dataset combinations

� Timing:– Use on-board 405GP timer running at core frequency– Both cases use same methodology– Disable interrupts during application run

Page 58: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

58

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Timing Validation Results� Average timing error = 0.6%� Standard deviation = 2.5%� Error range = -4.6% to 7.1%� Simulator speed:

– ~570KIPS on 1.2GHz AMD Athlon™ system

� Possible sources of error:– Memory details abstracted out (only average latency modeled)– Store buffer model (reverse engineered)– Rare branch cases (e.g., branch to self)

Page 59: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

59

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Model: Motivation� Study operating systems and application power behavior

– Performance/Architecture/Power trade-offs– Validated timing and power

� Tempo Model– Cycle accurate simulation infrastructure– Event-based power model– PowerPC 405GP chosen for first model

ARL Pecan board (instrumented for power measurements)National Instruments measurement systemEmbedded processor (power is a limiting factor)No event counters (event based model challenges)

Page 60: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

60

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Generating the Power Model

� Etotal= Eidle+ ∑all i (Ni×Ei)– Eidle is idle (Wait State) energy– Ni is the number of instances of event i– Ei is the energy cost of event i

Caches, TLB, branches, instructions, etc.� Microbenchmarks (run on Pecan board)

– Fixed number of events, compare to idle– Create as many as necessary to isolate event energies (~300,

only about 50 are needed)– Some details require more tests (more later)

Page 61: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

61

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Modeling Overview

Tempo PPC405GP

PPC405GP Pecan Board

National Instruments

Measurement Equipment

BenchmarksEmitterEvent

CircularBuffer

LabView

Reader Tool

EventPowerCost Table

Page 62: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

62

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Example Microbenchmark� Core/Bus 200MHz/66MHz� Measurements:

–Cycles/interation (Chit and Cmiss)–Ave. Power (Phit and Pmiss)–Time = T = C × 5ns–Want Eload_miss

� ∆E = PmissTmiss – PhitThit� ∆E = (Tmiss –Thit) Pidle + 3 Eload_miss

� Solve for Eload_miss

loop:

lwz r2,0(r3)addi r2,r2,1lwz r2,0(r4)addi r2,r2,1lwz r2,0(r5)addi r2,r2,1bctr loop

Page 63: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

63

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Modeled Events� Base (idle) power

– Power consumption when processor in Wait State� Average switching due to instructions

– Does not take ordering or instruction type into account.� Load/Store hits/misses� TLB manipulation instructions� Energy cost of different inst types (ALU, Mult, Div, etc.)� Branches (decoded in PFB or DCD, mispredictions etc.)� Instruction cache hits/misses� ITLB and DTLB misses (that hit in UTLB)� Data cache flushes

– Dirty line replacements

Page 64: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

64

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Validation� Validate energy:

– Sum energy events across an application run– Integrate power/time samples measured on hardware– Same 42 EEMBC applications (exclude short-running ones)

� Results:– Average error = -4.1%– Standard deviation = 5.1%– Range = -11.3% to 6.6%

� Power validation (runtime behavior)– Plot measurement data against simulated power

Page 65: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

65

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Validation

cjpeg

0

0.2

0.4

0.6

0.8

1

1.2

0 49 98 148 197 246 295time(ms)

Powe

r (w

atts

)

HardwareSimulation

Page 66: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

66

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Validation

fft.sine

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

0 10 20 30 40 50 60time(ms)

Pow

er (w

atts

)

HardwareSimulation

Page 67: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

67

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Validation

matrix

0.50.60.70.80.9

11.11.21.31.4

0 192 384 577 769 961 115time(ms)

Pow

er (w

atts

)

HardwareSimulation

Page 68: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

68

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Validation

djpeg

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 40 80 120 160 201 241time(ms)

Pow

er (w

atts

)

HardwareSimulation

Page 69: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

69

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Power Modeling� Live validated

power modeling of PowerPC 405GP system

� Software profile by power or timing

Page 70: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

70

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Sources of Error� Wait State power as the base

– Active but idle power is different� Instruction/Data switching

– Used average value for instruction switching power– Due to pipeline/control path energy variations– Instruction ordering is important

� Missed some important events?� Did not isolate events that do not occur together� Could not isolate store buffer power aspects� Others?

Page 71: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

71

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Future/Current Work� Detailed shared-memory multiprocessor simulation

– Snooping and directory-based cache-coherence protocols– Other detailed cores– Detailed interconnects

� PPC405LP– Voltage and frequency scaling support

� Tempo model of PPC750– Out-of-order processor

� Linux bringup� Tempo models of peripherals?

– Disks– Network interface cards– Many others

Page 72: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

72

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Challenges� One simulation source repository

– About 250K lines of mostly C code (including comments)� Cycle-accurate simulation models� More IBM development opportunities� Runtime machine configuration

– Configure and build machines at command prompt– Ability to plug in new models dynamically

� More faithfully model real systems– Minimize OS changes need to run on Mambo

� Verification of models� More workloads

Page 73: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

73

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

References� H. Shafi, P. Bohrer, J. Phelan, C. Rusu, and J. Peterson. “Design and

Validation of the Mambo Performance and Power Simulator for PowerPC Systems”, To appear in the IBM Journal of R&D.

� “The PowerPC Architecture: A Specification for a New Family of RISC Processors,” Edited by C. May, E. Silha, R. Simpson, and H. Warren, Morgan Kaufmann Publishers, 1994.

� PPC405GP User’s Manual, IBM, 2000.

� PPC405LP User’s Manual, IBM, 2003.

� G. Ganger, “System Oriented Evaluation of I/O Performance”, Ph.D. Dissertation, U. Michigan, 1995.

� http://mambo.austin.ibm.com (IBM internal)

Page 74: Mambo - Northwestern Universityusers.eecs.northwestern.edu/~ada829/doc/MamboTutorial-2003-03.pdf · Mambo will catch ROM calls (OpenFirmware) and emulate it Load stand-alone applications

74

IBM Research

Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation

Questions:� Contact information:

– Pat Bohrer ([email protected])– James L Peterson ([email protected])– Hazim Shafi ([email protected])– ARL: http://www.research.ibm.com/arl