Upload
doankhanh
View
217
Download
1
Embed Size (px)
Citation preview
IBM Research
ISPASS Workshop, March 8, 2003 © 2003 IBM Corporation
Mambo: Advances in PowerPC System Simulation
IBM’s Full System Simulation SolutionPatrick Bohrer, James Peterson, Hazim Shafi{pbohrer,petersjl,hshafi}@us.ibm.comIBM Austin Research Laboratory
2
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Overview� What is Mambo ?� Mambo Internals� Mambo Demonstrations/Results� Challenges� Conclusion
3
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
What is Mambo ?�A complete system simulator for PowerPC
systems– Embedded (405, 440, 750)– Server (64-bit Apache and others)– Game (Cell/STI)– Supercomputer (BlueGene/L)
�Modular, configurable infrastructure– Basic: Processor(s), memory, caches– I/O: Disk, Ethernet, buses
4
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
What is Mambo ? (Continued)� Features
– Accurate performance & power modeling (405)– Models complex SMP effects (64-bit)– I/O interactions– Plugs into IBM’s modeling infrastructure– Easy-to-use GUI or command-line interface
� Development environment– Pre-hardware software development & tuning– Provides more visibility into existing hardware systems– Alternative to real hardware– Debugging and development of system software (OS and Hypervisor)– Hardware verification
5
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo History� PASTA
– IBM architectural simulator used to evaluate PowerPC compiler output and the VMX extensions
� SimOS-PPC– Stanford infrastructure used originally to bootstrap effort within
ARL� Cell/STI and BlueGene/L
– Need for IBM proprietary tool� SimOS-PPC � Mambo
– IBM owned PowerPC extensions but …– Lost all code that was written by or evolved from Stanford code
6
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Stack
Mambo
Linux/PPC
PowerPC“Simple”
ISA Simulator Cache SimL1 & L2
Memory CntrlMemory Sim
ROM
Applications 32/64-bit
Analysis Tools
•Trace•Profile•Etc
TestPrograms
(TSTs, etc)
Visual-izationToolsUART
Timers“Tempo”
Cycle Accurate ISA Simulator ENET
Int Ctrlr
PowerPC
AIX 4.3.x
Intel x86
MacOS-X Linux 2.4.X
7
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Goals� Fast simulation
– Simulation speed should be a factor of the details desired by the user (Emitter)
� Easily to modify and enhance– Add new instructions within an hour– Add new collectors within a day– Add new devices within a week
� One code base– All projects feed into one CVS repository – All of IBM can benefit from available simulation models
8
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Value : Simulation Repository� Configuration Options:
–VMX extensions–Hypervisor–32-bit Desktop–32-bit Embedded–64-bit Server–OpenPIC interrupt controller–Universal interrupt controller (UIC)–PCI/IDE–HW and SW managed TLBs–HW and SW managed SLBs–UARTs–New Features*
� We continue to add– Devices – PowerPC extensions– Cycle-accurate models
DevicesProcessors
– Power models– Visualization Tools
9
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Uses� Principal software bring-up platform for some IBM Internal Projects
– Pre-hardware software development– Architecture definition and verification– Detailed profiles and statistics associated back to software stack
� Trace and testcase generation tool for hardware models– Processor traces, memory traces, etc.– Generate TSTs which exhibit OS behavior to be run on hardware models
� System software development & debugging– Linux and Hypervisor development platform
� Research tool– Basic research in energy modeling (DARPA PAC/C)– BlueGene/L (BGLsim)– DARPA HPCS PERCS– Model of future architecture extensions
10
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Collaboration� IBM development � IBM research labs� Academia
– PAC/C 1 & 2 (Univ. of Pittsburgh, Vanderbilt)– TRIPS (Univ. of Texas)– K42 (Univ. of Toronto, CMU, Univ. of Rochester, Univ. of Texas)
11
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Licensing� Binary licenses are granted to 3rd parties which are
collaborating with IBM on various projects� We assist other IBM teams in getting binary licenses
setup if they are willing to support the 3rd party
12
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo
Mambo Runtime Environment
tcl/tk/blt/mambo cmds TCL/Tk/BLTGUI Scripts
4
1
Startup TCL File (.gmambo.tcl)# Create simulator instancesim apache mysim
# Load boot imagemysim load elf bootImage
# Source the GUI scriptssource $env(EXEC_DIR)/../bin/lib/common/default_gui.tcl
unix $ ../run_guiGUI EnabledLicensed Materials – Property of IBM.© Copyright IBM Corporation 2001, 2002All Rights Reserved%
5KernelBoot Image
RAMDisk
(Optional)
3
Disk Image
NetworkServiceDaemon
mysim (apache model)
consoledisk modelnet model
cpu modelmemory
2
ROMROMImage
(rom.bin)
13
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
What do you run on Mambo ?� Load firmware into ROM and boot from ROM
– ROM may load operating system from a simulated device and boot it or the operating system resides in the ROM
� Load operating system into memory and boot– Mambo’s ELF loader will initialize memory and the processor– Mambo will catch ROM calls (OpenFirmware) and emulate it
� Load stand-alone applications into memory and run– Mambo’s ELF loader will initialize memory and the processor– TCL command will enable mode in Mambo where all system
calls (sc instructions) are caught and handled by simulator� TST or AVP testcases
– Convert testcases into self-testing TCL startup files
14
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Interface OverviewMambo
Tk & BLT
Mambo Specific Commands
TCL Language
Simulation Engine
% sim ppc405gp foo% foo cpu 0 display gpr 20x0000000000000000% while { [foo cpu 0 display spr pc] != 0xC00 }{ foo cycle 1 }
Graphical Users Interface (GUI) Script Library
Emitter
Shared Memory:CircularBuffer
ofEvents
Strip Chart Generator
Software profiler
Qtrace generator
Memory Tracer
Collectors:
Trace File
15
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Value (Visualization)L1 Miss Rates 999Mhz 1GB/768MB Heap Java MTRT MP Kernel
0
0.5
1
1.5
2
2.5
3
3.5
0.00E+00 2.00E+09 4.00E+09 6.00E+09 8.00E+09 1.00E+10 1.20E+10Processor Cycles
Perc
enta
ge M
isse
s/In
stru
ctio
ns
Ins L1 Miss Rate Data L1 Miss Rate
•Generate a signature of workload and then run inhigher fidelity (slower) modeat key segments.•Correlate events (like cache misses) back to source and assembly code.
16
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Supporting Efforts� Operating Systems
– Device Drivers– Linux PPC 32-bit– Linux PPC 64-bit– SSX
� Other Utilities– RAM disk setup– Network daemon
� Cross-development Tools– Compilers– Libraries– Binutils– GDB
Target: PPC64 Linux Host: AIX, Linux/x86
Target: PPC32 LinuxHost: AIX, Linux/x86
17
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Internals
18
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Internals� Run on several Hosts
• Linux, AIX, MacOS-X• PowerPC, x86• Host-endian, Big-endian, Little-endian
• All written in C• Considered C++• gcc, xlc, g++ • -Wall• Build with no errors, no warnings• Consistent, portable C
19
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo
Tk & BLT
Mambo Specific Commands
TCL Language
Simulation Engine
Graphical Users Interface (GUI) Script Library
User Commands� TCL for commands
� Tk/BLT for GUI
� TCL Commands– TCL scripts, for regression– C command interpreters– Command tables
% sim ppc405gp foo% foo cpu 0 display gpr 20x0000000000000000% while { [foo cpu 0 display spr pc] != 0xC00 } {foo cycle 1 }
20
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Command Tablesmysim trigger set pc 0x4032 “breakpoint mysim”
config
display
go
step
load
trigger
clear
set
display
mambo
cycle
assoc
pcConsistent Command Processing
21
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
(Simulated) Machine creation
� sim apache mysim� sim ppc405 mysim� sim ppc750 mysim switch
build apache build 405 build 750
consoleROMPCI
UART
emac
RTCPCI
UARTRTCPCI
ROM
tcl
22
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Main Data Structures
Generic Machine
Thread[ ]
Processor[ ]
Memory
Machine Specific
Memory Mapped Devices ROMPCIRTC…
011010110101010010
Machine StateProcessor State
Thread State
23
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Memory Access Functions� EA_Read_Memory
Translate addressPA_Read_Memory
PA_Read_Cache_Memory
PA_Read_Memory_Bus
ROMPCIRTC…
Memory Memory Mapped Devices
011010110101010010
24
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Event System� Event
– Time of event– Function to Execute– Opaque Data Pointer
� Timers� I/O Interrupts
– Disk– Console/Keyboard
� Checkpoints� User Interaction� Instruction Execution
25
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Instruction Execution
1. Translate Instruction Address
2. Fetch Instruction
3. Decode Instruction
4. Execute Instruction
5. Check for Interrupts/ExceptionsReset PC, MSR
Event
New Event
26
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Speed� On 1.2 GHz AMD Athlon Processor
� Simple: 2400 to 3300 Kilo-Instructions Per Second
� Time is spent:
– Simple instruction execution 18 to 20 %– Address Translation 18 to 24 %– Add Event + Process Event 7 % + 4 %
27
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Development� CVS – Source Repository
– Check out– Develop– Build– Regress– Check in
� Regression– Development
Run 4 platforms (Boot Linux, run ls, exit)Takes about 2m15s
– NightlyRun all platforms (Boot OS, optionally run apps, exit)Adding architectural tests and machine specific test bucketsTakes about 3h22m
28
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Configuration�Highly Configurable� config.h : #define of CONFIG options�Sizes of structures – TLBs, caches, ERATs, …� 32-bit or 64-bit�Hypervisor mode, threads�VMX, floating point, …
29
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Emitter� With or without emitter code
� Emit macro calls thru out the code
MamboWith Emit
Shared Memory:CircularBuffer
ofEvents
Strip Chart Generator
Software profiler
Qtrace generator
Memory Tracer
Collectors:
Trace File
30
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Emitter RecordsReader StartedREADER: Shared Memory Key is 76952READER: Writer Data found at 700000000000000READER: Reader Data found at 700000020000000READER: Reader Waiting for Header in Emitter DataREADER: Reader Found the Header in Emitter Data255: #1 CONFIG: CPUS:2 MEM_SIZE:100663296 CLOCK_FREQ:200 DISK_LAT:10 CPI:3.000000 L1D_BLK_SIZE:128 L1I_BLK_SIZE:320: #2 MSR: 01: #3 MSR: 0255: #4 IDLE_RANGE: 0x25270 <-> 0x2533c255: #5 PID_CREATE: PID:0 FLAG:512255: #6 PID_CREATE: PID:1 FLAG:0255: #7 EXEC_LOAD: PID:1 PATHNAME:'init' TXT_EA:0x10000100 TXT_VSID:0x60003018 TXT_LEN:34204 DATA_EA:0x20000398 DATA_VSID:0x6000140A DATA_LEN:4668255: #8 EXEC_LOAD: PID:1 PATHNAME:'/usr/lib/libcrypt.a[shr.o]' TXT_EA:0xD01980F8 TXT_VSID:0x60002814 TXT_LEN:2170 DATA_EA:0xF0001528 DATA_VSID:0x60003219 DATA_LEN:316…
0: #84 CYCLE:0 INST: EA:10000468 VSID:2150000 PA:1A67468 INST=906100400: #85 MemWrite: EA:2FF22CC0 VSID:2351000 PA:1BA3CC0 LENGTH:4 VALUE=01: #86 CYCLE:0 INST: EA:11E954 VSID:0 PA:11E954 INST=F8AF00F01: #87 MemWrite: EA:CBA70 VSID:0 PA:CBA70 LENGTH:8 VALUE=31: #88 CYCLE:3 INST: EA:11E958 VSID:0 PA:11E958 INST=F8CF00F81: #89 MemWrite: EA:CBA78 VSID:0 PA:CBA78 LENGTH:8 VALUE=3936280: #90 CYCLE:3 INST: EA:1000046C VSID:2150000 PA:1A6746C INST=480000040: #91 CYCLE:6 INST: EA:10000470 VSID:2150000 PA:1A67470 INST=800100580: #92 MemRead: EA:2ff22cd8 PA:1ba3cd8 LENGTH:4 VALUE=100003d81: #93 CYCLE:6 INST: EA:11E95C VSID:0 PA:11E95C INST=F8EF01001: #94 MemWrite: EA:CBA80 VSID:0 PA:CBA80 LENGTH:8 VALUE=01: #95 CYCLE:9 INST: EA:11E960 VSID:0 PA:11E960 INST=F90F01081: #96 MemWrite: EA:CBA88 VSID:0 PA:CBA88 LENGTH:8 VALUE=DEADBEEF0: #97 CYCLE:9 INST: EA:10000474 VSID:2150000 PA:1A67474 INST=7C0803A60: #98 CYCLE:12 INST: EA:10000478 VSID:2150000 PA:1A67478 INST=38210050…
255: #3304188 FOOTERREADER: Emitter #0 exitingReader Terminating
CPU Events
OS Event
Sim Event
31
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Emitter (cont.)� Any number of collectors can
be attached to a single instance of mambo-emit
� By combining the Qtracer and the Stripchart collector, we can generate traces which also provide an overview of activity across the trace
32
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Disk� PCI
– IDE– Standard device drivers– Disk Drives – DiskSim 2.0 for timing
– File for storage– Copy on Write
� “Bogus” Disks– Special Linux disk driver– Call-thru to Mambo– Immediate I/O– Copy on Write
33
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Devices
Mambo
Memory Map
PCI
ROM
UART
IDE
Linux/PPC
Call Through Support
Bogus Disk
Bogus Net
Dump Stats
Bogus Disk Driver PCI/IDE Drivers
Store to address 0xFFE00000Special Illegal Inst gpr3=126
Disk Image
34
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Debugging� Debug prints for modules
� Selectable at run time– TCL command
� Completely repeatable– To the cycle– Debugging– Regression depends on repeatability
35
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Debugging with GDB
Mambo
Linux/K42
Application
Simulated PowerPC
Socket
GNUDebugger
36
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Uses : Software Profiling
37
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Software Profiling
38
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Software Profiling (Continued)
39
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Software Profiling (Continued)
40
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Software Profiling (Continued)
41
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Mambo Uses : Performance and Power Modeling
42
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Performance Simulation: Tempo� Satisfy a need for performance/architecture research� Requires different methodology
– Event-based for speed, concurrency modeling, and timing accuracy
– Partition system simulator at functional block boundaries� Leverage Mambo functional infrastructure
– Add timing-related modifications to existing units– Mix functional/cycle-accurate modules– Tradeoff accuracy for speed
� Code reuse – Leverage existing functional models when possible
43
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Performance/Power Outline� Structure:
– Instruction semantics– Pipeline models– Functional units– Exceptions/Interrupts– Instruction/data memory paths and address translation
� PPC405GP model� Timing validation� Event-based power model� Power validation� Work in progress:
– Cache-coherent MP support– PPC750
44
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Simple vs. Tempo
Snapshot of Event Interleavings in Tempo: • Move I1 from DCD to EXE • Fetch next instruction • Move I2 from fetch buffer to DCD • Address translation for I4 • Move I3 from EXE to WB • Predict branch in prefetch buffer • Access Dcache for I4 • change next fetch PC • flush instructions after branch • Access Dcache for ....
Simple Main Loop: while (1) { FetchInstruction; DecodeInstruction; ExecuteSemantics; Set next PC; Interrupts? Exceptions? }
T=0
T=1
45
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
PowerPC Instruction Semantics� Simple (functional):
– One function per instruction (or instruction class)� Tempo (cycle-accurate):
– Structural and data dependencesBusy bits for architectural registersSupport for register renamingSplit semantics according to major pipeline stagesFunctional units (more later)
– Multiple functions: Issue, writeback– Synchronizing instructions– Interrupts and exception timing/synch. effects (more
later)
46
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Functional Units (Execution Stage)� Multiple types
– Load/Store, FXU, FPU etc.– Associate type(s) with every instruction– Used at issue stage
� Repeat rate (degree of pipelining)– Instruction dependent
� Latency– Per instruction
� Added to instruction decode table
47
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Pipeline Model� Implementation specific
– Similarities possible within an architecture family� Granularity
– Tradeoff speed/code complexity– One or multiple events?– Architecture plays an important role in decision
� Interrupts/Exceptions� Synchronizing instructions� Branch prediction
48
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Exceptions and Interrupts� Follow architecture definitions� Precise exceptions
– Implications on pipeline simulation– Instruction fetch effects
� Interrupt controller– Handle according to architecture semantics– Flush pipeline (wait for in-flight instructions if required)– Instruction fetch effects
� Synchronizing instructions– Share some of these restrictions– Not necessary for Simple
49
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Instruction Fetch� Not an issue for Simple
– In order, no prefetching, and no branch prediction � Implementation-specific instruction fetch unit
– Fetch algorithm– Interaction with pipeline– Translation faults on prefetches, cache access, etc.
� Branch predictor– Not needed in Simple
50
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Address Translation� Timing cost of address translation� Miss latency (hardware walks, exception handlers etc.)� Single/Multi-level translation (caching effects)
– SLB, TLB, ERAT, etc. – Simple uses ERATs for simulation speed
� Some instruction semantics affected – Speculative loads/stores
� Interface to cache/memory hierarchy� Exception generation
51
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Memory Simulation� Loads/stores generate translation events� Successful translations generate cache accesses� Detailed cache models� Store buffers
� Aspects that may or may not be modeled in detail– Bus models– Memory models– Cache-coherence
� Details needed for out-of-order cores and MP simulation, but not necessary for in-order cores (more later)
52
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Tempo PPC405GP Model
53
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Tempo PPC405GP Model (cont.)
PFB1 PFB0 DCD EXE
Instruction Fetch Unit Branch Predictor
ITLB DTLB
UTLB
I Cache D Cache
Functional Units
WB
Register File
54
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
PPC405GP Timing Details� No performance counters in hardware!� User’s Manual
– Pipeline structure– Branch prediction– Instruction latencies and repeat rates– TLB structure and miss latencies
� Challenges– Incomplete documentation (e.g., sensitivity to core/bus speeds)
� Pecan Board– PPC405GP-based evaluation board– SSX kernel – Small microbenchmarks to reverse engineer some cases
55
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Example Test Case (load/use bubble)
loop:
lwz r2,0(r3)
addi r2,r2,1
lwz r2,0(r4)
addi r2,r2,1
lwz r2,0(r5)
addi r2,r2,1
bctr loop
R3, R4, R5 initially point to same address
(all hit after initial miss)
Adds are dependent
56
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Example Test Case (load miss)
loop:
lwz r2,0(r3)
addi r2,r2,1
lwz r2,0(r4)
addi r2,r2,1
lwz r2,0(r5)
addi r2,r2,1
bctr loop
R3, R4, R5 access different lines that map to same set in 2-way D cache
(all miss)
Adds are dependent
57
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Timing Validation� Compare simulated time with hardware� Operating system: SSX Kernel� Hardware: PPC405GP-based Pecan board� Applications:
– EEMBC v. 1.0– 42 program/dataset combinations
� Timing:– Use on-board 405GP timer running at core frequency– Both cases use same methodology– Disable interrupts during application run
58
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Timing Validation Results� Average timing error = 0.6%� Standard deviation = 2.5%� Error range = -4.6% to 7.1%� Simulator speed:
– ~570KIPS on 1.2GHz AMD Athlon™ system
� Possible sources of error:– Memory details abstracted out (only average latency modeled)– Store buffer model (reverse engineered)– Rare branch cases (e.g., branch to self)
59
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Model: Motivation� Study operating systems and application power behavior
– Performance/Architecture/Power trade-offs– Validated timing and power
� Tempo Model– Cycle accurate simulation infrastructure– Event-based power model– PowerPC 405GP chosen for first model
ARL Pecan board (instrumented for power measurements)National Instruments measurement systemEmbedded processor (power is a limiting factor)No event counters (event based model challenges)
60
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Generating the Power Model
� Etotal= Eidle+ ∑all i (Ni×Ei)– Eidle is idle (Wait State) energy– Ni is the number of instances of event i– Ei is the energy cost of event i
Caches, TLB, branches, instructions, etc.� Microbenchmarks (run on Pecan board)
– Fixed number of events, compare to idle– Create as many as necessary to isolate event energies (~300,
only about 50 are needed)– Some details require more tests (more later)
61
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Modeling Overview
Tempo PPC405GP
PPC405GP Pecan Board
National Instruments
Measurement Equipment
BenchmarksEmitterEvent
CircularBuffer
LabView
Reader Tool
EventPowerCost Table
62
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Example Microbenchmark� Core/Bus 200MHz/66MHz� Measurements:
–Cycles/interation (Chit and Cmiss)–Ave. Power (Phit and Pmiss)–Time = T = C × 5ns–Want Eload_miss
� ∆E = PmissTmiss – PhitThit� ∆E = (Tmiss –Thit) Pidle + 3 Eload_miss
� Solve for Eload_miss
loop:
lwz r2,0(r3)addi r2,r2,1lwz r2,0(r4)addi r2,r2,1lwz r2,0(r5)addi r2,r2,1bctr loop
63
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Modeled Events� Base (idle) power
– Power consumption when processor in Wait State� Average switching due to instructions
– Does not take ordering or instruction type into account.� Load/Store hits/misses� TLB manipulation instructions� Energy cost of different inst types (ALU, Mult, Div, etc.)� Branches (decoded in PFB or DCD, mispredictions etc.)� Instruction cache hits/misses� ITLB and DTLB misses (that hit in UTLB)� Data cache flushes
– Dirty line replacements
64
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Validation� Validate energy:
– Sum energy events across an application run– Integrate power/time samples measured on hardware– Same 42 EEMBC applications (exclude short-running ones)
� Results:– Average error = -4.1%– Standard deviation = 5.1%– Range = -11.3% to 6.6%
� Power validation (runtime behavior)– Plot measurement data against simulated power
65
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Validation
cjpeg
0
0.2
0.4
0.6
0.8
1
1.2
0 49 98 148 197 246 295time(ms)
Powe
r (w
atts
)
HardwareSimulation
66
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Validation
fft.sine
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
0 10 20 30 40 50 60time(ms)
Pow
er (w
atts
)
HardwareSimulation
67
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Validation
matrix
0.50.60.70.80.9
11.11.21.31.4
0 192 384 577 769 961 115time(ms)
Pow
er (w
atts
)
HardwareSimulation
68
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Validation
djpeg
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 40 80 120 160 201 241time(ms)
Pow
er (w
atts
)
HardwareSimulation
69
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Power Modeling� Live validated
power modeling of PowerPC 405GP system
� Software profile by power or timing
70
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Sources of Error� Wait State power as the base
– Active but idle power is different� Instruction/Data switching
– Used average value for instruction switching power– Due to pipeline/control path energy variations– Instruction ordering is important
� Missed some important events?� Did not isolate events that do not occur together� Could not isolate store buffer power aspects� Others?
71
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Future/Current Work� Detailed shared-memory multiprocessor simulation
– Snooping and directory-based cache-coherence protocols– Other detailed cores– Detailed interconnects
� PPC405LP– Voltage and frequency scaling support
� Tempo model of PPC750– Out-of-order processor
� Linux bringup� Tempo models of peripherals?
– Disks– Network interface cards– Many others
72
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Challenges� One simulation source repository
– About 250K lines of mostly C code (including comments)� Cycle-accurate simulation models� More IBM development opportunities� Runtime machine configuration
– Configure and build machines at command prompt– Ability to plug in new models dynamically
� More faithfully model real systems– Minimize OS changes need to run on Mambo
� Verification of models� More workloads
73
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
References� H. Shafi, P. Bohrer, J. Phelan, C. Rusu, and J. Peterson. “Design and
Validation of the Mambo Performance and Power Simulator for PowerPC Systems”, To appear in the IBM Journal of R&D.
� “The PowerPC Architecture: A Specification for a New Family of RISC Processors,” Edited by C. May, E. Silha, R. Simpson, and H. Warren, Morgan Kaufmann Publishers, 1994.
� PPC405GP User’s Manual, IBM, 2000.
� PPC405LP User’s Manual, IBM, 2003.
� G. Ganger, “System Oriented Evaluation of I/O Performance”, Ph.D. Dissertation, U. Michigan, 1995.
� http://mambo.austin.ibm.com (IBM internal)
74
IBM Research
Mambo Tutorial | ISPASS Workshop | March 2003 © 2003 IBM Corporation
Questions:� Contact information:
– Pat Bohrer ([email protected])– James L Peterson ([email protected])– Hazim Shafi ([email protected])– ARL: http://www.research.ibm.com/arl