Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
TerraSwarmTerraSwarm
SponsoredbytheTerraSwarmResearchCenter,oneofsixcentersadministeredbytheSTARnetphaseoftheFocusCenterResearchProgram(FCRP)aSemiconductorResearchCorporationprogramsponsoredbyMARCOandDARPA.
AnIntegratedSimulationToolforComputerArchitectureandCyber-PhysicalSystems
HokeunKim1,2,ArminWasicek3,andEdwardA.Lee1
CyPhy'17,Seoul,Korea
1UniversityofCalifornia,Berkeley2LinkedInCorp.3TechnicalUniversityVienna
Introduction
• ManytoolsusedforCPSmodelingandsimulationemploysasimplifiedtimingmodelfor“cyber”partofCPS– Exampletools:OpenModelica,PtolemyII– E.g.,computationtime,communicationdelay
• Thesetoolsareuseful– Fasterthansimulatingoremulatingcyberpart– EnoughforCPSsimulationinmanycases
• But,sometimesweneedmorethanjustsimplifiedcomputation&communicationmodels
TerraSwarm Research Center 2
Motivation1– SideChannels
• Sidechannelattacks– Gaininginformationbyleveragingphysicalimplementationofcomputersystems
TerraSwarm Research Center 3
Brier,Clavier,andOlivier."Correlationpoweranalysiswithaleakagemodel.",CHESS2004.
Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:for varying data (0-255), model array and measurement array taken at the time of thesecond correlation peak.
on a common bus. The address of a data word is transmitted just before itsvalue that is in turn immediately followed by the opcode of the next instructionwhich is fetched. Such a behavior can be observed on a wide variety of chipseven those implementing 16 or 32-bit architectures. Correlation rates rangingfrom 60% to more than 90% can often be obtained. Figure 2 shows an exampleof partial correlation on a 32-bit architecture: when only 4 bits are predictedamong 32, the correlation loss is in about the ratio
√8 which is consistent with
the displayed correlations.This sort of results can be observed on various technologies and implemen-
tations. Nevertheless the following restrictions have to be mentioned:
– Sometimes the reference state is systematically 0. This can be assigned to theso-called pre-charged logic where the bus is cleared between each transferredvalue. Another possible reason is that complex architectures implement sep-arated busses for data and addresses, that may prohibit certain transitions.In all those cases the Hamming weight model is recovered as a particularcase of the more general Hamming distance model.
Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:for varying data (0-255), model array and measurement array taken at the time of thesecond correlation peak.
on a common bus. The address of a data word is transmitted just before itsvalue that is in turn immediately followed by the opcode of the next instructionwhich is fetched. Such a behavior can be observed on a wide variety of chipseven those implementing 16 or 32-bit architectures. Correlation rates rangingfrom 60% to more than 90% can often be obtained. Figure 2 shows an exampleof partial correlation on a 32-bit architecture: when only 4 bits are predictedamong 32, the correlation loss is in about the ratio
√8 which is consistent with
the displayed correlations.This sort of results can be observed on various technologies and implemen-
tations. Nevertheless the following restrictions have to be mentioned:
– Sometimes the reference state is systematically 0. This can be assigned to theso-called pre-charged logic where the bus is cleared between each transferredvalue. Another possible reason is that complex architectures implement sep-arated busses for data and addresses, that may prohibit certain transitions.In all those cases the Hamming weight model is recovered as a particularcase of the more general Hamming distance model.
(a)Hammingdistancefrom
datatoref.state
(B)Powerconsumption
Correlationof(a)and(b)– E.g.,poweranalysis
Timingdelaysarenotenough!
Motivation1– SideChannels
• ColdbootattackonDRAMs
TerraSwarm Research Center 4
Halderman,J.A.,etal.,“Lestweremember:Cold-bootattacksonencryptionkeys.“CommunicationsoftheACM,2009
ShamirandSomeren,"PlayingHideandSeekwithStoredKeys", FC99 (ConferenceonFinancialCryptography )
Looking for cryptographic keys
“Playing ’hide and seek’ with stored keys”[Shamir, van Someren 99]
FreezetheDRAMmemoryoftherunningsystemtopreventthedatafromdecaying
Readoutdataandlookforhighentropyindata(cryptographickey)
Motivation2– MOOCforCPS
• CPSclasses– Involvealotofhands-onexperiments
TerraSwarm Research Center 5
e.g.EECS149.1x,Cyber-PhysicalSystemsatUCBerkeleyhttps://www.edx.org/course/cyber-physical-systems-uc-berkeleyx-eecs149-1x
• MOOCforCPSclasses?– NotlikeotherCSclasses– AccuratemodelforCPSwouldhelp
Cyberpart
Physicalpart(environment)
Goals
• BuildingaCPSsimulatorsupportingaccuratecomputerarchitecturemodel
• Demonstrationofanopen-sourceintegratedsimulationtoolforCPSandcomputerarchitecture
• CasestudyusingDRAMpowerandthermalmodeling
TerraSwarm Research Center 6
Background– Tools
• Thegem5architecturesimulator(fromUMich)– Open-sourcepowerful,modular,flexibleandwidelyusedbothinacademiaandindustry
• Characteristics– Object-oriented,discrete-event–Modularcomponents(CPUs,Memories,Buses,Interconnects),easilyinterchangeable
– Simulatedsystem=collectionofobjects
TerraSwarm Research Center 7
Background– Tools
• PtolemyII– Anopen-sourcesoftwareforresearchoncyber-physicalsystems
– DevelopedatUCBerkeleysince1996– Supportsmodelingofboththecyberpart(computation,communication)&physicalprocess(continuousdynamics)
– Quitestable,easytolearnanduse(supportsGUI,onecanbuildamodelbydrawingcomponents)
– Basedonactor-orienteddesign– Moreinformationonhttp://ptolemy.org
TerraSwarm Research Center 8
Background– Tools
• Actor-OrientedDesigninPtolemyII– Actors
• Concurrentlyexecutedcomponents• Interactwithotheractorsthroughinput/outputports
• Modelcomputation,communication,physicalprocesses,etc.
– Directors• ImplementModelsofComputation(MoCs)
• Orchestratebehaviorofactors,forexample,wheneachactorshouldbeexecuted(=fired)
– Actorhierarchy• Anactorcanhavesub-atctors
TerraSwarm Research Center 9Claudius Ptolemaeus, Editor, System Design, Modeling, and Simulation Using Ptolemy II, Ptolemy.org, 2014.
Background– Tools
• ModelofComputation(MoC)– Asetofrulesorchestratingbehaviorofactors(e.g.,whentoexecuteactors,howactorsreacttoinputs)
10TerraSwarm Research Center
DiscreteEvent• Time-stampedevents
(e.g.timerevent,arrivalofmessages)
• Formodelingcomputationorcommunication
ContinuousTime• Sampling-basedsimulation,ODEsolvers• Formodelingphysicalprocesses(e.g.
thermaltransfer)
MultipleMoCs inasinglePtolemyIImodel
DIMM(DualIn-lineMemoryModule)
DRAM DRAMAMB
(AdvancedMemoryBuffer) DRAM DRAM
DRAMtoambient DRAMtoambientAMBtoambient
Coolingairflow
DRAMtoAMBdatatransfer
HeatdissipationtoambientAMBtoDRAMdatatransfer
Airflow
Background– DRAMModel
• DRAMthermalmodelbyLinetal.(ISCA`07)– Powerisproportionaltothroughput(GB/s)– FactorsthataffectDRAMtemperature
TerraSwarm Research Center 11
1.Physicalstructure
2.Heatdissipationfromcomponents
3.Coolingeffect
Linetal.,“ThermalmodelingandmanagementofDRAMmemorysystems”ISCA’07
Approach- IntegratedToolOverview
TerraSwarm Research Center 12
PtolemyIIModel
gem5Simulator
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
DiscreteEvent
DiscreteEvent
ContinuousTimeDiscreteEvent
Approach- gem5asCyberPartofCPS
TerraSwarm Research Center 13
PtolemyIIModel
gem5Simulator
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
(1)
Approach– Configuringgem5Simulator
• Implementationofgem5– Python– high-levelobjectconfiguration&simulation– C++– low-levelobjectimplementation(forperformance)
• Thegem5Simulatorpythonscripts– Modifyexecutionscriptsforperiodicexecution– gem5runsforgivencyclesandstops– ResumeafterPtolemyIImodelruns
• DRAMcomponent– AddDPRINTFfunctionstoDRAMcomponent– Printoutcommandandcycleinformation
TerraSwarm Research Center 14
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 15
PtolemyIIModel
gem5Simulator
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
(2)
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 16
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
gem5%Simulator%
Named%pipe%2%
Named%pipe%1%
Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#
!!!#
“Fire”%(Run#simula1on##for#N#cycles)#
“No;fy”%(Simula1on#finished#&#results#ready)#
Ptolemy%II%Model%
Gem5%Wrapper%Actor%
Store#simula1on#results#
Load#simula1on#results#
Javacustomactorgem5blocksonreadWrapperfire()blocksonread
Simulationinformationtransferred
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 17
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
gem5%Simulator%
Named%pipe%2%
Named%pipe%1%
Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#
!!!#
“Fire”%(Run#simula1on##for#N#cycles)#
“No;fy”%(Simula1on#finished#&#results#ready)#
Ptolemy%II%Model%
Gem5%Wrapper%Actor%
Store#simula1on#results#
Load#simula1on#results#
(1)Gem5Wrapperinitialize()triggersgem5
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 18
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
gem5%Simulator%
Named%pipe%2%
Named%pipe%1%
Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#
!!!#
“Fire”%(Run#simula1on##for#N#cycles)#
“No;fy”%(Simula1on#finished#&#results#ready)#
Ptolemy%II%Model%
Gem5%Wrapper%Actor%
Store#simula1on#results#
Load#simula1on#results#
(2)gem5runs
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 19
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
gem5%Simulator%
Named%pipe%2%
Named%pipe%1%
Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#
!!!#
“Fire”%(Run#simula1on##for#N#cycles)#
“No;fy”%(Simula1on#finished#&#results#ready)#
Ptolemy%II%Model%
Gem5%Wrapper%Actor%
Store#simula1on#results#
Load#simula1on#results#
(3)gem5finishesandstops
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 20
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
gem5%Simulator%
Named%pipe%2%
Named%pipe%1%
Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#
!!!#
“Fire”%(Run#simula1on##for#N#cycles)#
“No;fy”%(Simula1on#finished#&#results#ready)#
Ptolemy%II%Model%
Gem5%Wrapper%Actor%
Store#simula1on#results#
Load#simula1on#results#
(4)Gem5Wrapperfire()returns
Approach– Communicationbetweengem5&PtolemyII
TerraSwarm Research Center 21
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
gem5%Simulator%
Named%pipe%2%
Named%pipe%1%
Shared%File%Memory%trace:%<1me,#access#type,#addr>#<1me,#access#type,#addr>#<1me,#access#type,#addr>#
!!!#
“Fire”%(Run#simula1on##for#N#cycles)#
“No;fy”%(Simula1on#finished#&#results#ready)#
Ptolemy%II%Model%
Gem5%Wrapper%Actor%
Store#simula1on#results#
Load#simula1on#results#
(5)Gem5Wrapperpostfire()triggersgem5again
Approach– ADRAMbehavioralmodelinPtolemyII
TerraSwarm Research Center 22
PtolemyIIModel
gem5Simulator
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
(3)
Approach– ADRAMBehavioralModelinPtolemyII
TerraSwarm Research Center 23
Anarrayofrecords
{{bank=5,cmd ="READ",channel =0,service_time =107988},{bank=5,cmd ="READ",channel =0,service_time =108192},{bank=5,cmd ="READ",channel =0,service_time =108418},{bank=5,cmd ="READ",channel =1,service_time =109030},{bank=6,cmd ="WRITE",channel =0,service_time =109078}} Calculatethroughputover
amovingtimewindow
Processcommandsusingservice_time field
Approach– DRAMPower&ThermalModelinPtolemyII
TerraSwarm Research Center 24
PtolemyIIModel
gem5Simulator
L1#D#Cache#
L1#I#Cache#
CPU#
DRAM#
(4)
Approach– DRAMPower&ThermalModelinPtolemyII
• CMOSDevicepower=Staticpower+Dynamicpower
TerraSwarm Research Center 25
1
Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling
Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley
[email protected], [email protected]
P
device
= P
DRAM static
+ P
DRAM dynamic
(1)
P
DRAM
= P
DRAM static
+↵1⇥Throughput
read
+↵2⇥Throughput
write
(2)P
AMB
= P
AMB idle
+�⇥Throughput
Bypass
+�⇥Throughput
Local
(3)
T
AMB
= T
A
+ P
AMB
⇥ AMB
+ P
DRAM
⇥ DRAM AMB
(4)T
DRAM
= T
A
+P
AMB
⇥ AMB DRAM
+P
DRAM
⇥ DRAM
(5)
T (t+4t)� T (t) = (Tstable
� T (t))(1� e
�4t⌧ ) (6)
1
Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling
Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley
[email protected], [email protected]
P
device
= P
DRAM static
+ P
DRAM dynamic
(1)
P
DRAM
= P
DRAM static
+↵1⇥Throughput
read
+↵2⇥Throughput
write
(2)P
AMB
= P
AMB idle
+�⇥Throughput
Bypass
+�⇥Throughput
Local
(3)
T
AMB
= T
A
+ P
AMB
⇥ AMB
+ P
DRAM
⇥ DRAM AMB
(4)T
DRAM
= T
A
+P
AMB
⇥ AMB DRAM
+P
DRAM
⇥ DRAM
(5)
T (t+4t)� T (t) = (Tstable
� T (t))(1� e
�4t⌧ ) (6)
Equations&coefficientsarefromLinetal.,ISCA`07
Staticpower,constant,measuredvalue
Dynamicpower
Coefficients(power/throughput),constant,measuredvalue
Accessestolocalornon-localchannel
• DRAMdynamicpower∝ Throughput
1
Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling
Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley
[email protected], [email protected]
P
device
= P
DRAM static
+ P
DRAM dynamic
(1)
P
DRAM
= P
DRAM static
+↵1⇥Throughput
read
+↵2⇥Throughput
write
(2)P
AMB
= P
AMB idle
+�⇥Throughput
Bypass
+�⇥Throughput
Local
(3)
T
AMB
= T
A
+ P
AMB
⇥ AMB
+ P
DRAM
⇥ DRAM AMB
(4)T
DRAM
= T
A
+P
AMB
⇥ AMB DRAM
+P
DRAM
⇥ DRAM
(5)
T (t+4t)� T (t) = (Tstable
� T (t))(1� e
�4t⌧ ) (6)
Approach– DRAMPower&ThermalModelinPtolemyII
TerraSwarm Research Center 26
• DRAMstabletemperaturefromDRAMpower
• CurrentDRAMtemperature
1
Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling
Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley
[email protected], [email protected]
P
device
= P
DRAM static
+ P
DRAM dynamic
(1)
P
DRAM
= P
DRAM static
+↵1⇥Throughput
read
+↵2⇥Throughput
write
(2)P
AMB
= P
AMB idle
+�⇥Throughput
Bypass
+�⇥Throughput
Local
(3)
T
AMB
= T
A
+ P
AMB
⇥ AMB
+ P
DRAM
⇥ DRAM AMB
(4)T
DRAM
= T
A
+P
AMB
⇥ AMB DRAM
+P
DRAM
⇥ DRAM
(5)
T (t+4t)� T (t) = (Tstable
� T (t))(1� e
�4t⌧ ) (6)
1
Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling
Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley
[email protected], [email protected]
P
device
= P
DRAM static
+ P
DRAM dynamic
(1)
P
DRAM
= P
DRAM static
+↵1⇥Throughput
read
+↵2⇥Throughput
write
(2)P
AMB
= P
AMB idle
+�⇥Throughput
Bypass
+�⇥Throughput
Local
(3)
T
AMB
= T
A
+ P
AMB
⇥ AMB
+ P
DRAM
⇥ DRAM AMB
(4)T
DRAM
= T
A
+P
AMB
⇥ AMB DRAM
+P
DRAM
⇥ DRAM
(5)
T (t+4t)� T (t) = (Tstable
� T (t))(1� e
�4t⌧ ) (6)
Stabletemperatures
Thermalresistance(temperature/power),constant,measuredvalue
Ambienttemperature
1
Integration of the gem5 simulator into Ptolemy IIfor DRAM Power and Thermal Modeling
Hokeun Kim (student) and Armin Wasicek (project mentor) Department of EECS, University of California,Berkeley
[email protected], [email protected]
P
device
= P
DRAM static
+ P
DRAM dynamic
(1)
P
DRAM
= P
DRAM static
+↵1⇥Throughput
read
+↵2⇥Throughput
write
(2)P
AMB
= P
AMB idle
+�⇥Throughput
Bypass
+�⇥Throughput
Local
(3)
T
AMB
= T
A
+ P
AMB
⇥ AMB
+ P
DRAM
⇥ DRAM AMB
(4)T
DRAM
= T
A
+P
AMB
⇥ AMB DRAM
+P
DRAM
⇥ DRAM
(5)
T (t+4t)� T (t) = (Tstable
� T (t))(1� e
�4t⌧ ) (6)
Currenttemperature
Equations&coefficientsarefromLinetal.,ISCA`07
Rateoftemperaturechangeconstant,measuredvalue
Approach– DRAMPower&ThermalModelinPtolemyII
TerraSwarm Research Center 27
(a)$
MoC
InputsPower
Stabletemperature
Currenttemperature
Approach– DRAMPower&ThermalModelinPtolemyII
TerraSwarm Research Center 28
• AMB/DRAMpowertendencyexampleConvergestostable
temperature(Pstatic +Pdynamic) DRAMbecomesidle
Convergestostabletemperature
(Pstatic)
DRAMaccessstartsfromTambient
Temperature(°C)
Time(Sec)
ExperimentsandResults– ExperimentalSetup
• Experimentedon– Differentcacheconfigurations– Differentsoftwareworkloads
• Tomeasure– AverageDRAM/AMBpower– PeakDRAM/AMBtemperaturereachedduringsimulation(0.1secinsimulatedtime)
TerraSwarm Research Center 29
ExperimentsandResults– ExperimentalSetup
• gem5configurations(exceptcaches)– ISA– ARM– CPUType– TimingSimpleCPU:Stallsoneveryloadmemoryaccess.
– Clockrate– CPU:1GHz/System:1GHz– Off-chipDRAMmemory:DDR3SDRAMwithadatarateof1600MHzandabuswidthof16bits.
– Cacheblocksize– 64bytes
TerraSwarm Research Center 30
ExperimentsandResults– PowerandTemperatureResults
• Benchmark– Top5memory-intensiveprogramsfromMiBench• wherememory-intensityisdefinedas#memoryaccesses(read+write)/instruction
TerraSwarm Research Center 31
MiBenchprograms writes reads total instructions
executed memory
intensity (%)
cjpeg_large 6,183 74,966 1,000,000 8.11
rijndael_large 2,558 68,458 1,000,000 7.1
typeset_small 12,843 55,963 1,000,000 6.88
dijkstra_large 4,942 59,198 1,000,000 6.41
patricia_large 4,255 49,198 1,000,000 5.35
ExperimentsandResults– PowerandTemperatureResults
• Powerandtemperatureresultsfordifferentcacheconfigurations– (workload:cjpeg_large)
TerraSwarm Research Center 32
Cache size options (KB) Average power (mW) Maximum temperature increase (10-6 °C)
L1 (I/D) L2 DRAM AMB DRAM AMB
16 N/A 1,057 4,027 2.67 6.05
32 N/A 1,023 4,011 2.63 5.93
64 N/A 1,000 4,008 2.46 5.51
32 128 996 4,006 2.17 4.86
32 256 995 4,006 1.99 4.47
ExperimentsandResults– PowerandTemperatureResults
• Temperatureresultsfordifferentworkloads– (Cacheconfiguration:L1:16kB,L2:N/A)
TerraSwarm Research Center 33
2.7$
3.6$
5.1$
2.1$
2.0$
6.1$
8.3$
12.2$
4.8$
4.8$
0.0$
2.0$
4.0$
6.0$
8.0$
10.0$
12.0$
14.0$
cjpeg_large$$
rijndael_large$$
typeset_small$$
dijkstra_large$$
patricia_large$$
Maxim
um'te
mpe
rature'
increase'(1
026C
°)'
'
DRAM$$
AMB$$
Highmemoryintensity
Lowmemoryintensity
Highbypassthroughput->HighAMBpower
Intensivewriteaccesses
ToolDemonstration
• Gem5tunedfortoolintegration– https://github.com/gem5-ptolemy/
• PtolemyII– http://ptolemy.org• Version11.0– Developmentversion
– Casestudyexamplemodel:• ptolemy/actor/lib/gem5/demo/DramThermalModel.xml
TerraSwarm Research Center 34
Conclusions
• Summary– Thegem5architecturesimulatorisintegratedintoPtolemyIIasancomputerarchitecturalaspectswithhigheraccuracy
– Experimentsshowusefulnessoftheapproach• Futurework– Morearchitecturalinformationfromgem5– Moreapplicationsfortheproposedapproach
• Formoreinformation– https://github.com/gem5-ptolemy/– http://ptolemy.org
TerraSwarm Research Center 35