35
TerraSwarm TerraSwarm Sponsored by the TerraSwarm Research Center, one of six centers administered by the STARnet phase of the Focus Center Research Program (FCRP) a Semiconductor Research Corporation program sponsored by MARCO and DARPA. An Integrated Simulation Tool for Computer Architecture and Cyber-Physical Systems Hokeun Kim 1,2 , Armin Wasicek 3 , and Edward A. Lee 1 CyPhy'17, Seoul, Korea 1 University of California, Berkeley 2 LinkedIn Corp. 3 Technical University Vienna

An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

TerraSwarmTerraSwarm

SponsoredbytheTerraSwarmResearchCenter,oneofsixcentersadministeredbytheSTARnetphaseoftheFocusCenterResearchProgram(FCRP)aSemiconductorResearchCorporationprogramsponsoredbyMARCOandDARPA.

AnIntegratedSimulationToolforComputerArchitectureandCyber-PhysicalSystems

HokeunKim1,2,ArminWasicek3,andEdwardA.Lee1

CyPhy'17,Seoul,Korea

1UniversityofCalifornia,Berkeley2LinkedInCorp.3TechnicalUniversityVienna

Page 2: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Introduction

• ManytoolsusedforCPSmodelingandsimulationemploysasimplifiedtimingmodelfor“cyber”partofCPS– Exampletools:OpenModelica,PtolemyII– E.g.,computationtime,communicationdelay

• Thesetoolsareuseful– Fasterthansimulatingoremulatingcyberpart– EnoughforCPSsimulationinmanycases

• But,sometimesweneedmorethanjustsimplifiedcomputation&communicationmodels

TerraSwarm Research Center 2

Page 3: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Motivation1– SideChannels

• Sidechannelattacks– Gaininginformationbyleveragingphysicalimplementationofcomputersystems

TerraSwarm Research Center 3

Brier,Clavier,andOlivier."Correlationpoweranalysiswithaleakagemodel.",CHESS2004.

Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:for varying data (0-255), model array and measurement array taken at the time of thesecond correlation peak.

on a common bus. The address of a data word is transmitted just before itsvalue that is in turn immediately followed by the opcode of the next instructionwhich is fetched. Such a behavior can be observed on a wide variety of chipseven those implementing 16 or 32-bit architectures. Correlation rates rangingfrom 60% to more than 90% can often be obtained. Figure 2 shows an exampleof partial correlation on a 32-bit architecture: when only 4 bits are predictedamong 32, the correlation loss is in about the ratio

√8 which is consistent with

the displayed correlations.This sort of results can be observed on various technologies and implemen-

tations. Nevertheless the following restrictions have to be mentioned:

– Sometimes the reference state is systematically 0. This can be assigned to theso-called pre-charged logic where the bus is cleared between each transferredvalue. Another possible reason is that complex architectures implement sep-arated busses for data and addresses, that may prohibit certain transitions.In all those cases the Hamming weight model is recovered as a particularcase of the more general Hamming distance model.

Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:for varying data (0-255), model array and measurement array taken at the time of thesecond correlation peak.

on a common bus. The address of a data word is transmitted just before itsvalue that is in turn immediately followed by the opcode of the next instructionwhich is fetched. Such a behavior can be observed on a wide variety of chipseven those implementing 16 or 32-bit architectures. Correlation rates rangingfrom 60% to more than 90% can often be obtained. Figure 2 shows an exampleof partial correlation on a 32-bit architecture: when only 4 bits are predictedamong 32, the correlation loss is in about the ratio

√8 which is consistent with

the displayed correlations.This sort of results can be observed on various technologies and implemen-

tations. Nevertheless the following restrictions have to be mentioned:

– Sometimes the reference state is systematically 0. This can be assigned to theso-called pre-charged logic where the bus is cleared between each transferredvalue. Another possible reason is that complex architectures implement sep-arated busses for data and addresses, that may prohibit certain transitions.In all those cases the Hamming weight model is recovered as a particularcase of the more general Hamming distance model.

(a)Hammingdistancefrom

datatoref.state

(B)Powerconsumption

Correlationof(a)and(b)– E.g.,poweranalysis

Timingdelaysarenotenough!

Page 4: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Motivation1– SideChannels

• ColdbootattackonDRAMs

TerraSwarm Research Center 4

Halderman,J.A.,etal.,“Lestweremember:Cold-bootattacksonencryptionkeys.“CommunicationsoftheACM,2009

ShamirandSomeren,"PlayingHideandSeekwithStoredKeys", FC99 (ConferenceonFinancialCryptography )

Looking for cryptographic keys

“Playing ’hide and seek’ with stored keys”[Shamir, van Someren 99]

FreezetheDRAMmemoryoftherunningsystemtopreventthedatafromdecaying

Readoutdataandlookforhighentropyindata(cryptographickey)

Page 5: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Motivation2– MOOCforCPS

• CPSclasses– Involvealotofhands-onexperiments

TerraSwarm Research Center 5

e.g.EECS149.1x,Cyber-PhysicalSystemsatUCBerkeleyhttps://www.edx.org/course/cyber-physical-systems-uc-berkeleyx-eecs149-1x

• MOOCforCPSclasses?– NotlikeotherCSclasses– AccuratemodelforCPSwouldhelp

Cyberpart

Physicalpart(environment)

Page 6: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Goals

• BuildingaCPSsimulatorsupportingaccuratecomputerarchitecturemodel

• Demonstrationofanopen-sourceintegratedsimulationtoolforCPSandcomputerarchitecture

• CasestudyusingDRAMpowerandthermalmodeling

TerraSwarm Research Center 6

Page 7: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Background– Tools

• Thegem5architecturesimulator(fromUMich)– Open-sourcepowerful,modular,flexibleandwidelyusedbothinacademiaandindustry

• Characteristics– Object-oriented,discrete-event–Modularcomponents(CPUs,Memories,Buses,Interconnects),easilyinterchangeable

– Simulatedsystem=collectionofobjects

TerraSwarm Research Center 7

Page 8: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Background– Tools

• PtolemyII– Anopen-sourcesoftwareforresearchoncyber-physicalsystems

– DevelopedatUCBerkeleysince1996– Supportsmodelingofboththecyberpart(computation,communication)&physicalprocess(continuousdynamics)

– Quitestable,easytolearnanduse(supportsGUI,onecanbuildamodelbydrawingcomponents)

– Basedonactor-orienteddesign– Moreinformationonhttp://ptolemy.org

TerraSwarm Research Center 8

Page 9: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Background– Tools

• Actor-OrientedDesigninPtolemyII– Actors

• Concurrentlyexecutedcomponents• Interactwithotheractorsthroughinput/outputports

• Modelcomputation,communication,physicalprocesses,etc.

– Directors• ImplementModelsofComputation(MoCs)

• Orchestratebehaviorofactors,forexample,wheneachactorshouldbeexecuted(=fired)

– Actorhierarchy• Anactorcanhavesub-atctors

TerraSwarm Research Center 9Claudius Ptolemaeus, Editor, System Design, Modeling, and Simulation Using Ptolemy II, Ptolemy.org, 2014.

Page 10: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Background– Tools

• ModelofComputation(MoC)– Asetofrulesorchestratingbehaviorofactors(e.g.,whentoexecuteactors,howactorsreacttoinputs)

10TerraSwarm Research Center

DiscreteEvent• Time-stampedevents

(e.g.timerevent,arrivalofmessages)

• Formodelingcomputationorcommunication

ContinuousTime• Sampling-basedsimulation,ODEsolvers• Formodelingphysicalprocesses(e.g.

thermaltransfer)

MultipleMoCs inasinglePtolemyIImodel

Page 11: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

DIMM(DualIn-lineMemoryModule)

DRAM DRAMAMB

(AdvancedMemoryBuffer) DRAM DRAM

DRAMtoambient DRAMtoambientAMBtoambient

Coolingairflow

DRAMtoAMBdatatransfer

HeatdissipationtoambientAMBtoDRAMdatatransfer

Airflow

Background– DRAMModel

• DRAMthermalmodelbyLinetal.(ISCA`07)– Powerisproportionaltothroughput(GB/s)– FactorsthataffectDRAMtemperature

TerraSwarm Research Center 11

1.Physicalstructure

2.Heatdissipationfromcomponents

3.Coolingeffect

Linetal.,“ThermalmodelingandmanagementofDRAMmemorysystems”ISCA’07

Page 12: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach- IntegratedToolOverview

TerraSwarm Research Center 12

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

DiscreteEvent

DiscreteEvent

ContinuousTimeDiscreteEvent

Page 13: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach- gem5asCyberPartofCPS

TerraSwarm Research Center 13

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(1)

Page 14: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Configuringgem5Simulator

• Implementationofgem5– Python– high-levelobjectconfiguration&simulation– C++– low-levelobjectimplementation(forperformance)

• Thegem5Simulatorpythonscripts– Modifyexecutionscriptsforperiodicexecution– gem5runsforgivencyclesandstops– ResumeafterPtolemyIImodelruns

• DRAMcomponent– AddDPRINTFfunctionstoDRAMcomponent– Printoutcommandandcycleinformation

TerraSwarm Research Center 14

Page 15: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 15

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(2)

Page 16: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 16

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

Javacustomactorgem5blocksonreadWrapperfire()blocksonread

Simulationinformationtransferred

Page 17: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 17

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(1)Gem5Wrapperinitialize()triggersgem5

Page 18: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 18

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(2)gem5runs

Page 19: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 19

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(3)gem5finishesandstops

Page 20: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 20

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(4)Gem5Wrapperfire()returns

Page 21: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– Communicationbetweengem5&PtolemyII

TerraSwarm Research Center 21

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

gem5%Simulator%

Named%pipe%2%

Named%pipe%1%

Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#

!!!#

“Fire”%(Run#simula1on##for#N#cycles)#

“No;fy”%(Simula1on#finished#&#results#ready)#

Ptolemy%II%Model%

Gem5%Wrapper%Actor%

Store#simula1on#results#

Load#simula1on#results#

(5)Gem5Wrapperpostfire()triggersgem5again

Page 22: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– ADRAMbehavioralmodelinPtolemyII

TerraSwarm Research Center 22

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(3)

Page 23: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– ADRAMBehavioralModelinPtolemyII

TerraSwarm Research Center 23

Anarrayofrecords

{{bank=5,cmd ="READ",channel =0,service_time =107988},{bank=5,cmd ="READ",channel =0,service_time =108192},{bank=5,cmd ="READ",channel =0,service_time =108418},{bank=5,cmd ="READ",channel =1,service_time =109030},{bank=6,cmd ="WRITE",channel =0,service_time =109078}} Calculatethroughputover

amovingtimewindow

Processcommandsusingservice_time field

Page 24: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 24

PtolemyIIModel

gem5Simulator

L1#D#Cache#

L1#I#Cache#

CPU#

DRAM#

(4)

Page 25: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– DRAMPower&ThermalModelinPtolemyII

• CMOSDevicepower=Staticpower+Dynamicpower

TerraSwarm Research Center 25

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

[email protected], [email protected]

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

[email protected], [email protected]

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Equations&coefficientsarefromLinetal.,ISCA`07

Staticpower,constant,measuredvalue

Dynamicpower

Coefficients(power/throughput),constant,measuredvalue

Accessestolocalornon-localchannel

• DRAMdynamicpower∝ Throughput

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

[email protected], [email protected]

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Page 26: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 26

• DRAMstabletemperaturefromDRAMpower

• CurrentDRAMtemperature

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

[email protected], [email protected]

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

[email protected], [email protected]

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Stabletemperatures

Thermalresistance(temperature/power),constant,measuredvalue

Ambienttemperature

1

Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling

Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley

[email protected], [email protected]

P

device

= P

DRAM static

+ P

DRAM dynamic

(1)

P

DRAM

= P

DRAM static

+↵1⇥Throughput

read

+↵2⇥Throughput

write

(2)P

AMB

= P

AMB idle

+�⇥Throughput

Bypass

+�⇥Throughput

Local

(3)

T

AMB

= T

A

+ P

AMB

⇥ AMB

+ P

DRAM

⇥ DRAM AMB

(4)T

DRAM

= T

A

+P

AMB

⇥ AMB DRAM

+P

DRAM

⇥ DRAM

(5)

T (t+4t)� T (t) = (Tstable

� T (t))(1� e

�4t⌧ ) (6)

Currenttemperature

Equations&coefficientsarefromLinetal.,ISCA`07

Rateoftemperaturechangeconstant,measuredvalue

Page 27: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 27

(a)$

MoC

InputsPower

Stabletemperature

Currenttemperature

Page 28: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Approach– DRAMPower&ThermalModelinPtolemyII

TerraSwarm Research Center 28

• AMB/DRAMpowertendencyexampleConvergestostable

temperature(Pstatic +Pdynamic) DRAMbecomesidle

Convergestostabletemperature

(Pstatic)

DRAMaccessstartsfromTambient

Temperature(°C)

Time(Sec)

Page 29: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

ExperimentsandResults– ExperimentalSetup

• Experimentedon– Differentcacheconfigurations– Differentsoftwareworkloads

• Tomeasure– AverageDRAM/AMBpower– PeakDRAM/AMBtemperaturereachedduringsimulation(0.1secinsimulatedtime)

TerraSwarm Research Center 29

Page 30: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

ExperimentsandResults– ExperimentalSetup

• gem5configurations(exceptcaches)– ISA– ARM– CPUType– TimingSimpleCPU:Stallsoneveryloadmemoryaccess.

– Clockrate– CPU:1GHz/System:1GHz– Off-chipDRAMmemory:DDR3SDRAMwithadatarateof1600MHzandabuswidthof16bits.

– Cacheblocksize– 64bytes

TerraSwarm Research Center 30

Page 31: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

ExperimentsandResults– PowerandTemperatureResults

• Benchmark– Top5memory-intensiveprogramsfromMiBench• wherememory-intensityisdefinedas#memoryaccesses(read+write)/instruction

TerraSwarm Research Center 31

MiBenchprograms writes reads total instructions

executed memory

intensity (%)

cjpeg_large 6,183 74,966 1,000,000 8.11

rijndael_large 2,558 68,458 1,000,000 7.1

typeset_small 12,843 55,963 1,000,000 6.88

dijkstra_large 4,942 59,198 1,000,000 6.41

patricia_large 4,255 49,198 1,000,000 5.35

Page 32: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

ExperimentsandResults– PowerandTemperatureResults

• Powerandtemperatureresultsfordifferentcacheconfigurations– (workload:cjpeg_large)

TerraSwarm Research Center 32

Cache size options (KB) Average power (mW) Maximum temperature increase (10-6 °C)

L1 (I/D) L2 DRAM AMB DRAM AMB

16 N/A 1,057 4,027 2.67 6.05

32 N/A 1,023 4,011 2.63 5.93

64 N/A 1,000 4,008 2.46 5.51

32 128 996 4,006 2.17 4.86

32 256 995 4,006 1.99 4.47

Page 33: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

ExperimentsandResults– PowerandTemperatureResults

• Temperatureresultsfordifferentworkloads– (Cacheconfiguration:L1:16kB,L2:N/A)

TerraSwarm Research Center 33

2.7$

3.6$

5.1$

2.1$

2.0$

6.1$

8.3$

12.2$

4.8$

4.8$

0.0$

2.0$

4.0$

6.0$

8.0$

10.0$

12.0$

14.0$

cjpeg_large$$

rijndael_large$$

typeset_small$$

dijkstra_large$$

patricia_large$$

Maxim

um'te

mpe

rature'

increase'(1

026C

°)'

'

DRAM$$

AMB$$

Highmemoryintensity

Lowmemoryintensity

Highbypassthroughput->HighAMBpower

Intensivewriteaccesses

Page 34: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

ToolDemonstration

• Gem5tunedfortoolintegration– https://github.com/gem5-ptolemy/

• PtolemyII– http://ptolemy.org• Version11.0– Developmentversion

– Casestudyexamplemodel:• ptolemy/actor/lib/gem5/demo/DramThermalModel.xml

TerraSwarm Research Center 34

Page 35: An Integrated Simulation Tool for Computer Architecture and … · 2019-12-01 · value. Another possible reason is that complex architectures implement sep-arated busses for data

Conclusions

• Summary– Thegem5architecturesimulatorisintegratedintoPtolemyIIasancomputerarchitecturalaspectswithhigheraccuracy

– Experimentsshowusefulnessoftheapproach• Futurework– Morearchitecturalinformationfromgem5– Moreapplicationsfortheproposedapproach

• Formoreinformation– https://github.com/gem5-ptolemy/– http://ptolemy.org

TerraSwarm Research Center 35