53
D214 Using Simulated Hardware – Virtualized Software Development Jakob Engblom

Using Simulated Hardware – Virtualized Software …archive.oredev.org/download/18.5bd7fa0510edb4a8ce4800023649/Jak… · RNC RBS BSC BTS CS CF MGCF Media Gateway/S G TSP ... Ericsson

Embed Size (px)

Citation preview

D214

Using Simulated Hardware –Virtualized Software Development

Jakob Engblom

Virtualized Software Development

3

Traditional Development Process

• Use hardware to test and debug software– Expensive

– Unwieldy environment for debugging

– Not available early enough

– Often, not every developer can have a test bed

4

Virtutech Simics• Virtualized Software Development

– Simulates the system under development – Simulates the (immediate) environment– Runs entire software image unchanged– Same build chain as physical system

• Benefits over physical hardware

– Customizable– Cheaper– Programmer-friendly– Scriptable & controllable– Available earlier– More flexible

Simics Technology

How do we achieve this?

6

Virtualizing the Hardware

Operating system

User program

MiddlewareDBServer

Drivers Firmware

Complete productionsoftware

Virtual hardware

Application

Service

Control

Connectivity

MSC

SGSN

HSS GMSC/Transit

User dataControl

Application Service CapabilityServers

Media Gateway PSTN/ISDN

GSM/EDGE

WCDMA

Backbone Switches/Routers

InternetIntranets

GGSN

RNCRBS

BSC

BTS

CSCF MGCF

Media Gateway/S

G

TSP

AXDCPP

AXE WPP

7Hardware

Virtualizing the Hardware

CPU

Operating system

User program

RAM

FLASH

MiddlewareDBServer

Complete productionsoftware

LCD

ASICROM

PCI

I2C

BusCPU

Drivers Firmware

The software can’t tell the difference

Network net

Identical build tools chain

DiskVirtual

hardware

Runs binaries from real target

Disk Ctrl

8

Abstraction Levels

Functional instruction-set & transaction-level device behavior

Timing-correct cycle-level (SystemC)

Implementation-level (VHDL/Verilog)

Operating system API (VxSim)

Service API (Java library)

Operating System API Standard (POSIX)

Abs

tract

ion

HW/SW interfaceHW/SW interface

Stable & narrow interface, enables

fast execution

Stable & narrow interface, enables

fast execution

Excessive detail gives very slow

simulation

Excessive detail gives very slow

simulation

Not same binaries as target, additional

build chain

Not same binaries as target, additional

build chain

Too abstract to provide information on actual

target behavior

Too abstract to provide information on actual

target behavior

Cycle-accurate instruction-set

9

Simics Virtual Hardware

• Very high execution speed

– 100s to 1000s of MIPS– JIT Technology ISS

• Timing approximated– Does not model caches,

pipelines, buses, and device implementation details

– Covers 80% to 95% of real-time systems development

• Complete system– Networks, multiple

machines, multicore etc.

• Processors– Complete instruction set

– Function identical to real

– User and supervisor modes

– Memory-management unit

• Devices– Function modeled

– Transaction-level abstraction

• Networks– Packets/message-level

10

Simics Compared to...Development cards Host-compiled simulation•Both run real binaries•HW has real timing•Simics offers convenient debug•Simics often faster•Simics available before hardware•Fault-injection possible in Simics•Simics better at networks

•Simics runs real binaries, not host•Simics requires no special build and OS emulation layer•Simics provides correct relative execution speed between nodes•Both handle networks•Host-compiled might be faster

Instruction-Set Simulators Cycle-accurate simulation•Simics provides the whole system•Simics faster than traditional ISS

•More detailed timing than Simics•Low-level timed interaction visible•Simics much faster (10x to 100x)•Models take more time to create

Software Development

With Simics

12

Handy Features of Simics

• Checkpointing – Store current state; pick up and continue later– Position workload once, use many times– Distribute a known system state for a software load– Package a bug for parallel investigation by many engs.

• Determinism– Same initial state gives same execution– Repeat the same execution any number of times– Investigate a problem time after time– Simplifies problem reproduction

13

Handy Features of Simics

• Visibility (insight without intrusion)– All state can be observed

– All events can be traced and logged

• Controllability– Any part of machine or state can be changed

– Fault injection an interesting special case

• Virtual time– Time is completely virtual – Global synchronization across all machines– Single-step code on one machine or processor with no “flooding”

14

Handy Features of Simics

• Configurability– Any parameter of system can be changed

• Sandboxing– Simulated machine complete isolated

– Allows investigating ”nasty code”

• Reverse debugging– Roll back execution to previous state

– Reverse breakpoints

– Investigate details of program errors

– Reduces time to find hard bugs and development risk

15

Reverse Debugging

• Going forwards

• Back up and find out what happened

• For an entire system, including distributed and multiprocessor systems

16

Networked System Testing

SimicsSimicsSimulated HW

OS

Application

Simics Network Link

Simulation

Simulated HW

OS

Application

Network connect

Simulated HW

OS

Application

Network connect

Real-network

Traffic gen

Network listener and tester

Behavior model System

under test

System under test

Other nodeson the

simulatednetwork

Other nodeson the

simulatednetwork

Real HW

OS

Application

Real HW

OS

Application

Real-worldnodes

connected to the simulatednetwork

Real-worldnodes

connected to the simulatednetwork

17

Incomplete Systems/Scaffolding

• Virtual environments are very useful for incomplete systems– If a bootrom does not exist, load OS directly

into memory and configure system state– To quickly get a prototype board, stub

hardware with fixed values for registers– Use breakpoints and “cheat” to fake OS

functionality not yet implemented – Model other network nodes by behavior, not as

concrete hardware with real software

Multicore

19

The Multicore Revolution is Here!

• The massive move to parallel computers and multiple processor cores instead of single processors has been trumpeted before.

• This time it is for real. Why?

• More instruction-level parallelism hard to find– Very complex designs needed for small gain

• Clock frequency scaling is slowing drastically– Too much power and heat when pushing envelope

• Cannot communicate across chip fast enough– Better to design small local units with short paths

• Effective use of billions of transistors– Easier to reuse a basic unit many times

• Potential for very easy scaling– Just keep adding processors/cores for higher performance

20

Embedded Multicore

• Multiprocessor and multicore systems are the future for embedded systems– Dominant in server market since 1980s

– Prevalent in SoC design since 2000

– Standard on the desktop in 2006

• Now the only option for maximum performance

Vendor Chip Max #Cores

Arch AMP SMP

ARM ARM11 MPCore 4 ARMv6 X X

Cavium Octeon CN38 16 MIPS64 X

PA Semi PA6T custom 8 PPC X

Freescale MPC8641D 2 PPC X X

X

X

IBM 970MP 2 PPC64 X

IBM Cell 9 PPC64,DSP

Raza XLR 7-series 8 MIPS64 X

TI OMAP2 3 ARM,C55,IVA

21

Software on Multicore is Hard

• Parallelism required to gain performance– Parallel hardware is “easy” to design– Parallel software is known to be hard to write

• Existing software assumes single-processor– Multitasking != multiprocessor-ready– Software breaks in new interesting ways on multipro

• True concurrency is fundamentally hard– Human minds have a hard time with concurrency– Especially in complex software systems– Some phenomena cannot occur on a single processor

running multiple threads, only on true multipro

22

Multiprocessors & Debug

• Limited visibility into hardware– Single debug port, multiple processors– High speed, concurrent execution

• Timing-sensitive– Small changes in timing alters system behavior radically– Hardware variations impact software behavior

• Indeterminism– Rerunning a program gives different results– Hard to reproduce bugs

• Heisenbugs– Inserting probes to trace behavior alters behavior– Bugs hide when they are being debugged

• Other cores keeps running even if one core stopped

23

Three Steps of Debugging

1. Provoking errors– Forcing the system to a state where things break

2. Reproducing errors– Recreating a provoked error reliably

3. Locating the source of errors – Investigating the program flow and data– Depends on success in reproduction

Simics helps with all three steps

24

Debugging Multicore... in Simics

1. Provoking errors– Vary configuration, processor speeds, latencies– Force corner cases to occur

2. Reproducing errors– Checkpoints & determinism make reproduction easy– No Heisenbugs– Error situations easy to package and distribute

3. Locating the source of errors – Reverse debugging a key tool– Global time synchronization & global stop– No probe effect from instrumentation and tracing

Building Virtual Hardware

26

Building a Model

• Prerequisite to obtaining benefits of virtualized software development

• How do we achieve this?

Backplane

CPU

RAM

Device

FLASH

Device

DSP

Device

CPU

RAM

Device

FLASH

Device

Enet

Device

Enet

DevelopmentHardware

Virtual DevelopmentPlatform

Simics Model

27

ProcessorProcessorProcessor

Processor Device

Network

Device

DeviceMemory

ASIC

Flash Interconnect

Arc

hite

ctur

e

Processor Device

Network

Device

Memory

Flash Interconnect

Con

figur

atio

n

• Use Simics framework• Reuse VT components

– Large library available• Adapt VT components

• Model custom parts– DML– C, C++– Python

• Device modeling by– Virtutech– Customer– Partner– Consultant

Modeling Your System

Device

ASIC

28

bank b {register DMA_control size 4 @ 0x20 { field EN [31] "Enable DMA";field SWT [30] "Software Trigger";field TS [15:0] "Transfer size";method after_write(memop) {inline $do_dma_transfer();

}}register DMA_source size 4 @ 0x24; register DMA_dest size 4 @ 0x28;

method do_dma_transfer() {if ($DMA_control.EN==1) {local uint16 count = $DMA_control.TS;local uint8 local_buf[4];local exception_type_t result;

while(count>0) {// copy memory details elided...

$DMA_source += 4;$DMA_dest += 4; count -= 1;

}// clear SWT bit, update TS$DMA_control.SWT = 0;$DMA_control.TS = count;

...

The DML Modeling Language

• Domain-specific for creating fast hardware device models

• Declarative style• Fast compiled models• Models binary redistributable• Efficient coding

– 5 times smaller than C– Quick start modeling– Iterative lazy development– Much faster than SystemC

• Modeling time:– Depends on model complexity– Hours to days to weeks

Case Studies

30

Case Study: Switchcore

• Problem: SwitchCore needed to develop and test drivers and protocol stacks for their next generation Xpeedium3 chips

• Challenges:– Silicon not yet available– SwitchCore customers need to evaluate performance and to develop their

own software layers– Previously had used an internal simulator but slow and expensive to

maintain• Solution: Model Xpeedium3 using Simics• Benefits:

– Internal software development (including offshore)– Customers can develop their own software using the same model– Reduced delay between prototype availability and production orders

31

Board with CPU

RAM

PPC 8548

CPUPLB

eTSEC

PCIe

UART

OS

Apps

DriverI2C

PHY

PHY

PHY

I2C Hub

MDIO

PCIe

I2C

custom link

MDIO

X3Chip

xMII

Serial

FLASH

PCIe Switch

EEPROM

Ethernet

Ethernet

Front panel

Back-plane

Switchcore Board Components

32

Case Study: Wind River/8641D

• Problem: Wind River needed to develop software for the FreescaleMPC8641D dual-core PowerPC SoC

• Challenges:– No prototype silicon was available– Silicon schedule was slipping but customers still required Wind River

support on schedule– 8641D is a dual-core chip - this is not a straightforward port

• Solution:– Wind River’s engineering organization ported VxWorks using Virtutech

Simics with the 8641D processor model• Benefits:

– Development could start ahead of silicon– Improved productivity, improved software quality, earlier availability of

8641D software to Wind River’s customers

33

Case Study: Ericsson

• Problem: Ericsson needed to test software on a large range of base-station configurations

• Challenges:– Hardware is expensive and takes 2-14 days to reconfigure before

testing– Systems can have up to 66 boards and 700 processors– Test teams are geographically distributed

• Solution: Create Simics models for each board and handle all re-configuration through scripts

• Benefits:– Enormous reduction in cost of capital equipment used for testing– Can reconfigure a system almost instantly– Can handle even a fully populated system

Benefits Summary

35

Virtualized Software DevelopmentHardware-based methodology Virtualized software development

Application development has to get started hand-built scaffolding

Software development can start once basic architectural decisions have been made

Separate development methodologies for application and lower-level software

Uniform development methodology; far more iterations lower risk & higher quality

System integration unpredictable and always on the critical path

System integration is quick and uncovers fewer problems: quality built-in much earlier

Long delay between hardware availability and product shipment (revenue)

Minimal delay between hardware availability and FCS (and $)

36

Driving Quality Sooner

Time

Number of

DefectsRemoved

With hardware only

With virtualized software development

Customer ship date

Software development Integration and test Deployed

Development starts earlier

More defects found during development

phase

Fewer defects found during integration

Product ships earlier

Higher quality

Questions?

38

Remember! Enter the evaluation form and be a part of making Øredev even better.

You will automatically be part of the evening lottery

Spare Slides

40

Simics Modeling Level: Processor

• Instruction-set simulation (ISS)• Goal is very high performance• Complete and correct processor functionality

– All instructions semantics bit-correct vs real machine– Supervisor-mode & user-mode– Runs the complete target instruction set

• Including Altivec, SSE, 3dNow, VIS, etc. extensions– All accessible values represented

• User-level registers• Supervisor-level registers• Model-specific registers, ASIs, debug register, etc.

• Memory-management unit• Timing abstracted

– Add details if required

41

Simics Modeling Level: Devices

• Hardware modeled as a set of devices– Memory map of machine (as seen by processor)– At the programming register level

• Model the program-visible behavior– Configuration registers– Control register – Data transmitted & received

• Transaction-level modeling– Reads, writes, DMA transfers, network packets– For high-performance models

• ASICs & FPGAs– Model programming interface behavior– Not detailed implementation

• Detailed timing can be added if required

42

Simics Modeling Level: Networks

• Interfaced using “real” network devices• Networks modeled at message level

– Entire messages (packets, frames, ...) delivered as a unit• Hardware addressing used

– Ethernet MAC, 1553 Node IDs– Does not care about higher-level protocols– Ethernet allows IPv4, IPv6, TCP, UDP, SCP, ICMP, ...

• Any topology or addressing scheme– Broadcast, Unicast, switched, point-to-point, etc.

• Perfect network by default– Introduce latencies– Introduce bandwidth limits– Introduce faults– Introduce arbitration

43

Simics Modeling: No Limits• Boards/machines:

– Single processor– Multiprocessor – Shared memory, local memories

• Backplane/interconnect:– Network (Ethernet, ATM, I2C, ATCA, custom links...)– Shared memory (PCI, PCIe, custom system...)

• System level:– Multiple boards and machines– Heterogeneous processors, boards, machines

• Networks:– Any number of networks– Mixing different network types

• Scalability:– Always allows 64-bit memory space– Simulation can be distributed

44

Modeling Your System

Processor Device

Network

Processor Device

DeviceMemory

ASIC

Flash Interconnect

Arc

hite

ctur

e

1. Determine the components• System architecture docs• User guides• Component manuals

45

Processor Device

Network

Processor Device

DeviceMemory

ASIC

Flash Interconnect

Arc

hite

ctur

e

1. Determine the components• System architecture docs• User guides• Component manuals

2. Reuse library components• Virtutech libraries• Processors• Devices• Interconnects & networks• System structure

Modeling Your System

Processor Device

Network

Processor Device

Memory

Flash Interconnect

Con

figur

atio

n

46

Processor Device

Network

Processor Device

DeviceMemory

ASIC

Flash Interconnect

Arc

hite

ctur

e

Processor Device

Network

Processor Device

Memory

Flash Interconnect

Con

figur

atio

n

1. Determine the components• System architecture docs• User guides• Component manuals

2. Reuse library components• Virtutech libraries• Processors• Devices• Interconnects & networks• System structure

3. Model unique components• And adapt existing• Virtutech• Customer• Consultant

Modeling Your System

Device

ASIC

Processor

47

The Modeling Process

• Determine level of abstraction• Model devices at interfaces

– PCI interface: read and write transactions– Memory map interface: read and write transactions– DMA transactions into/out of memory– Network: packets in and out– Interrupts lines: high or low

• Minimal device state to model behavior at interfaces

• Whatever documentation a software programmer needs is what we need

48

PPC440core

UIC

DCR Map

PLBMap

DCR Registers

DDR Controller

DDRSDRAM

PCIBridge

Eth1

Eth0

Ext BusController

GPIO

I2C

UART0

UART1

SRAM

FLASH

DMA

PLB Arbiter

Clock, Power, Control

MAL

Modeled as Simics

memories

Devices modeled (mostly) as dummies

Devices where function needs to be

modeled

Mapping a System For Modeling

PPC 440 GP block diagram

49

Network Simulation with Simics: Node View

SimicsSimics

Simulated machine

OS

Application

Simulated machine

OS

Application

Simulated machine

OS

Application

Simulatedmachine sends

packets onto the simulated network

Simulatedmachine sends

packets onto the simulated network

Simics Network Link

Simulation

Simulated machine

OS

Application

Regular OS networking API for

the applications

Regular OS networking API for

the applications

OS talks to the network device,

like on a real machine

OS talks to the network device,

like on a real machine

50

Network Instrumentation

SimicsSimics

Simulated machine

OS

Application

Simulated machine

OS

Application

Packets travel on the simulated

network(s)

Packets travel on the simulated

network(s)

Network instrumentation

moduleSimics

Network LinkSimulation

Simulated machine

OS

ApplicationInstrumentation at network coreInstrumentation at network core

Simulated machine

OS

Application

Packets can be inspected, killed,

corrupted, delayed, bandwidth limited

Packets can be inspected, killed,

corrupted, delayed, bandwidth limited

Instrumentation at network

devices

Instrumentation at network

devices

51

Uses for Virtual Hardware

• Hardware replacement for test & development– CapEx savings– Capacity & capability increase: more systems available

• Early hardware availability– Operating system bring-up & development

• Performance tuning & debugging– Unintrusive diagnosis of bottlenecks– Cache, TLB, Disk, Network profiling

• Fault injection– Stop & crash nodes, inject network faults– Repeatable & reliable

• Regression testing– Automation & perfect control over system

• Scalability testing– More hardware than in the real world

52

Reverse Debugging

• Stop & go back in time– Instead of rerunning

program from start

– No need to rerun and hope for bug to reoccur

– Investigate exactly what happened this time

– Breakpoints & watchpointsbackwards in time

– Very powerful for parallel programs

BackupGo forward

Only some runs reproduce the

right error

Only some runs reproduce the

right error

53

Reverse Debugging Techniques

• Trace-based– Record system execution

– Special hardware support or simulator

– Use as “tape recorder,” fixed execution observed

– Hard to extend to multipro

• Simulation-based– Record in simulator

– Replay in same simulator

– Can change state and continue execution

– More powerful solution

BackupGo forward

Backup

And go somewhere else