An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another...

Preview:

Citation preview

TerraSwarmTerraSwarm

SponsoredbytheTerraSwarmResearchCenter,oneofsixcentersadministeredbytheSTARnetphaseoftheFocusCenterResearchProgram(FCRP)aSemiconductorResearchCorporationprogramsponsoredbyMARCOandDARPA.

AnIntegratedSimulationToolforComputerArchitectureandCyber-PhysicalSystems

HokeunKim1,2,ArminWasicek3,andEdwardA.Lee1

CyPhy'17,Seoul,Korea

1UniversityofCalifornia,Berkeley2LinkedInCorp.3TechnicalUniversityVienna

Introduction

• ManytoolsusedforCPSmodelingandsimulationemploysasimplifiedtimingmodelfor“cyber”partofCPS– Exampletools:OpenModelica,PtolemyII– E.g.,computationtime,communicationdelay

• Thesetoolsareuseful– Fasterthansimulatingoremulatingcyberpart– EnoughforCPSsimulationinmanycases

• But,sometimesweneedmorethanjustsimplifiedcomputation&communicationmodels

TerraSwarm Research Center 2

Motivation1– SideChannels

• Sidechannelattacks– Gaininginformationbyleveragingphysicalimplementationofcomputersystems

TerraSwarm Research Center 3

Brier,Clavier,andOlivier."Correlationpoweranalysiswithaleakagemodel.",CHESS2004.

Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:for varying data (0-255), model array and measurement array taken at the time of thesecond correlation peak.

on a common bus. The address of a data word is transmitted just before itsvalue that is in turn immediately followed by the opcode of the next instructionwhich is fetched. Such a behavior can be observed on a wide variety of chipseven those implementing 16 or 32-bit architectures. Correlation rates rangingfrom 60% to more than 90% can often be obtained. Figure 2 shows an exampleof partial correlation on a 32-bit architecture: when only 4 bits are predictedamong 32, the correlation loss is in about the ratio

√8 which is consistent with

the displayed correlations.This sort of results can be observed on various technologies and implemen-

tations. Nevertheless the following restrictions have to be mentioned:

– Sometimes the reference state is systematically 0. This can be assigned to theso-called pre-charged logic where the bus is cleared between each transferredvalue. Another possible reason is that complex architectures implement sep-arated busses for data and addresses, that may prohibit certain transitions.In all those cases the Hamming weight model is recovered as a particularcase of the more general Hamming distance model.

Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:for varying data (0-255), model array and measurement array taken at the time of thesecond correlation peak.

on a common bus. The address of a data word is transmitted just before itsvalue that is in turn immediately followed by the opcode of the next instructionwhich is fetched. Such a behavior can be observed on a wide variety of chipseven those implementing 16 or 32-bit architectures. Correlation rates rangingfrom 60% to more than 90% can often be obtained. Figure 2 shows an exampleof partial correlation on a 32-bit architecture: when only 4 bits are predictedamong 32, the correlation loss is in about the ratio

√8 which is consistent with

the displayed correlations.This sort of results can be observed on various technologies and implemen-

tations. Nevertheless the following restrictions have to be mentioned:

– Sometimes the reference state is systematically 0. This can be assigned to theso-called pre-charged logic where the bus is cleared between each transferredvalue. Another possible reason is that complex architectures implement sep-arated busses for data and addresses, that may prohibit certain transitions.In all those cases the Hamming weight model is recovered as a particularcase of the more general Hamming distance model.

(a)Hammingdistancefrom

datatoref.state

(B)Powerconsumption

Correlationof(a)and(b)– E.g.,poweranalysis

Timingdelaysarenotenough!

Motivation1– SideChannels

• ColdbootattackonDRAMs

TerraSwarm Research Center 4

Halderman,J.A.,etal.,“Lestweremember:Cold-bootattacksonencryptionkeys.“CommunicationsoftheACM,2009

ShamirandSomeren,"PlayingHideandSeekwithStoredKeys", FC99 (ConferenceonFinancialCryptography )

Looking for cryptographic keys

“Playing ’hide and seek’ with stored keys”[Shamir, van Someren 99]

FreezetheDRAMmemoryoftherunningsystemtopreventthedatafromdecaying

Readoutdataandlookforhighentropyindata(cryptographickey)

Motivation2– MOOCforCPS

• CPSclasses– Involvealotofhands-onexperiments

TerraSwarm Research Center 5

e.g.EECS149.1x,Cyber-PhysicalSystemsatUCBerkeleyhttps://www.edx.org/course/cyber-physical-systems-uc-berkeleyx-eecs149-1x

• MOOCforCPSclasses?– NotlikeotherCSclasses– AccuratemodelforCPSwouldhelp

Cyberpart

Physicalpart(environment)

Goals

• BuildingaCPSsimulatorsupportingaccuratecomputerarchitecturemodel

• Demonstrationofanopen-sourceintegratedsimulationtoolforCPSandcomputerarchitecture

• CasestudyusingDRAMpowerandthermalmodeling

TerraSwarm Research Center 6

Background– Tools

• Thegem5architecturesimulator(fromUMich)– Open-sourcepowerful,modular,flexibleandwidelyusedbothinacademiaandindustry

• Characteristics– Object-oriented,discrete-event–Modularcomponents(CPUs,Memories,Buses,Interconnects),easilyinterchangeable

– Simulatedsystem=collectionofobjects

TerraSwarm Research Center 7

Background– Tools

• PtolemyII– Anopen-sourcesoftwareforresearchoncyber-physicalsystems

– DevelopedatUCBerkeleysince1996– Supportsmodelingofboththecyberpart(computation,communication)&physicalprocess(continuousdynamics)

– Quitestable,easytolearnanduse(supportsGUI,onecanbuildamodelbydrawingcomponents)

– Basedonactor-orienteddesign– Moreinformationonhttp://ptolemy.org

TerraSwarm Research Center 8

Background– Tools

• Actor-OrientedDesigninPtolemyII– Actors

• Concurrentlyexecutedcomponents• Interactwithotheractorsthroughinput/outputports

• Modelcomputation,communication,physicalprocesses,etc.

– Directors• ImplementModelsofComputation(MoCs)

• Orchestratebehaviorofactors,forexample,wheneachactorshouldbeexecuted(=fired)

– Actorhierarchy• Anactorcanhavesub-atctors

TerraSwarm Research Center 9Claudius Ptolemaeus, Editor, System Design, Modeling, and Simulation Using Ptolemy II, Ptolemy.org, 2014.

Background– Tools

• ModelofComputation(MoC)– Asetofrulesorchestratingbehaviorofactors(e.g.,whentoexecuteactors,howactorsreacttoinputs)

10TerraSwarm Research Center

DiscreteEvent• Time-stampedevents

(e.g.timerevent,arrivalofmessages)

• Formodelingcomputationorcommunication

ContinuousTime• Sampling-basedsimulation,ODEsolvers• Formodelingphysicalprocesses(e.g.

thermaltransfer)

MultipleMoCs inasinglePtolemyIImodel

DIMM(DualIn-lineMemoryModule)

DRAM DRAMAMB

(AdvancedMemoryBuffer) DRAM DRAM

DRAMtoambient DRAMtoambientAMBtoambient

Coolingairflow

DRAMtoAMBdatatransfer

HeatdissipationtoambientAMBtoDRAMdatatransfer

Airflow

Background– DRAMModel

• DRAMthermalmodelbyLinetal.(ISCA`07)– Powerisproportionaltothroughput(GB/s)– FactorsthataffectDRAMtemperature

TerraSwarm Research Center 11

1.Physicalstructure

2.Heatdissipationfromcomponents

3.Coolingeffect

Linetal.,“ThermalmodelingandmanagementofDRAMmemorysystems”ISCA’07

Approach- IntegratedToolOverview

TerraSwarm Research Center 12

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

DiscreteEvent

DiscreteEvent

ContinuousTimeDiscreteEvent

Approach- gem5asCyberPartofCPS

TerraSwarm Research Center 13

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(1)

Approach– Configuringgem5Simulator

• Implementationofgem5– Python– high-levelobjectconfiguration&simulation– C++– low-levelobjectimplementation(forperformance)

• Thegem5Simulatorpythonscripts– Modifyexecutionscriptsforperiodicexecution– gem5runsforgivencyclesandstops– ResumeafterPtolemyIImodelruns

• DRAMcomponent– AddDPRINTFfunctionstoDRAMcomponent– Printoutcommandandcycleinformation

TerraSwarm Research Center 14

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 15

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(2)

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 16

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

Javacustomactorgem5blocksonreadWrapperfire()blocksonread

Simulationinformationtransferred

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 17

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(1)Gem5Wrapperinitialize()triggersgem5

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 18

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(2)gem5runs

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 19

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(3)gem5finishesandstops

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 20

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(4)Gem5Wrapperfire()returns

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 21

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(5)Gem5Wrapperpostfire()triggersgem5again

Approach– ADRAMbehavioralmodelinPtolemyII

TerraSwarm Research Center 22

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(3)

Approach– ADRAMBehavioralModelinPtolemyII

TerraSwarm Research Center 23

Anarrayofrecords

{{bank=5,cmd ="READ",channel =0,service_time =107988},{bank=5,cmd ="READ",channel =0,service_time =108192},{bank=5,cmd ="READ",channel =0,service_time =108418},{bank=5,cmd ="READ",channel =1,service_time =109030},{bank=6,cmd ="WRITE",channel =0,service_time =109078}} Calculatethroughputover

amovingtimewindow

Processcommandsusingservice_time field

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 24

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(4)

Approach– DRAMPower&ThermalModelinPtolemyII

• CMOSDevicepower=Staticpower+Dynamicpower

TerraSwarm Research Center 25

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

hokeunkim@eecs.berkeley.edu, arminw@berkeley.edu

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

hokeunkim@eecs.berkeley.edu, arminw@berkeley.edu

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Equations&coefficientsarefromLinetal.,ISCA`07

Staticpower,constant,measuredvalue

Dynamicpower

Coefficients(power/throughput),constant,measuredvalue

Accessestolocalornon-localchannel

• DRAMdynamicpower∝ Throughput

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

hokeunkim@eecs.berkeley.edu, arminw@berkeley.edu

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 26

• DRAMstabletemperaturefromDRAMpower

• CurrentDRAMtemperature

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

hokeunkim@eecs.berkeley.edu, arminw@berkeley.edu

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

hokeunkim@eecs.berkeley.edu, arminw@berkeley.edu

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Stabletemperatures

Thermalresistance(temperature/power),constant,measuredvalue

Ambienttemperature

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

hokeunkim@eecs.berkeley.edu, arminw@berkeley.edu

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Currenttemperature

Equations&coefficientsarefromLinetal.,ISCA`07

Rateoftemperaturechangeconstant,measuredvalue

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 27

(a)$

MoC

InputsPower

Stabletemperature

Currenttemperature

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 28

• AMB/DRAMpowertendencyexampleConvergestostable

temperature(Pstatic +Pdynamic) DRAMbecomesidle

Convergestostabletemperature

(Pstatic)

DRAMaccessstartsfromTambient

Temperature(°C)

Time(Sec)

ExperimentsandResults– ExperimentalSetup

• Experimentedon– Differentcacheconfigurations– Differentsoftwareworkloads

• Tomeasure– AverageDRAM/AMBpower– PeakDRAM/AMBtemperaturereachedduringsimulation(0.1secinsimulatedtime)

TerraSwarm Research Center 29

ExperimentsandResults– ExperimentalSetup

• gem5configurations(exceptcaches)– ISA– ARM– CPUType– TimingSimpleCPU:Stallsoneveryloadmemoryaccess.

– Clockrate– CPU:1GHz/System:1GHz– Off-chipDRAMmemory:DDR3SDRAMwithadatarateof1600MHzandabuswidthof16bits.

– Cacheblocksize– 64bytes

TerraSwarm Research Center 30

ExperimentsandResults– PowerandTemperatureResults

• Benchmark– Top5memory-intensiveprogramsfromMiBench• wherememory-intensityisdefinedas#memoryaccesses(read+write)/instruction

TerraSwarm Research Center 31

MiBenchprograms writes reads total instructions

executed memory

intensity (%)

cjpeg_large 6,183 74,966 1,000,000 8.11

rijndael_large 2,558 68,458 1,000,000 7.1

typeset_small 12,843 55,963 1,000,000 6.88

dijkstra_large 4,942 59,198 1,000,000 6.41

patricia_large 4,255 49,198 1,000,000 5.35

ExperimentsandResults– PowerandTemperatureResults

• Powerandtemperatureresultsfordifferentcacheconfigurations– (workload:cjpeg_large)

TerraSwarm Research Center 32

Cache size options (KB) Average power (mW) Maximum temperature increase (10-6 °C)

L1 (I/D) L2 DRAM AMB DRAM AMB

16 N/A 1,057 4,027 2.67 6.05

32 N/A 1,023 4,011 2.63 5.93

64 N/A 1,000 4,008 2.46 5.51

32 128 996 4,006 2.17 4.86

32 256 995 4,006 1.99 4.47

ExperimentsandResults– PowerandTemperatureResults

• Temperatureresultsfordifferentworkloads– (Cacheconfiguration:L1:16kB,L2:N/A)

TerraSwarm Research Center 33

2.7$

3.6$

5.1$

2.1$

2.0$

6.1$

8.3$

12.2$

4.8$

4.8$

0.0$

2.0$

4.0$

6.0$

8.0$

10.0$

12.0$

14.0$

cjpeg_large$$

rijndael_large$$

typeset_small$$

dijkstra_large$$

patricia_large$$

Maxim

um'te

mpe

rature'

increase'(1

026C

°)'

'

DRAM$$

AMB$$

Highmemoryintensity

Lowmemoryintensity

Highbypassthroughput->HighAMBpower

Intensivewriteaccesses

ToolDemonstration

• Gem5tunedfortoolintegration– https://github.com/gem5-ptolemy/

• PtolemyII– http://ptolemy.org• Version11.0– Developmentversion

– Casestudyexamplemodel:• ptolemy/actor/lib/gem5/demo/DramThermalModel.xml

TerraSwarm Research Center 34

Conclusions

• Summary– Thegem5architecturesimulatorisintegratedintoPtolemyIIasancomputerarchitecturalaspectswithhigheraccuracy

– Experimentsshowusefulnessoftheapproach• Futurework– Morearchitecturalinformationfromgem5– Moreapplicationsfortheproposedapproach

• Formoreinformation– https://github.com/gem5-ptolemy/– http://ptolemy.org

TerraSwarm Research Center 35

Recommended