19
LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison Everything should be made as simple as possible, but not simpler—Albert Einstein * Now at AMD Research, Austin TX

LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

LogCA:AHigh-LevelPerformanceModelforHardwareAccelerators

MuhammadShoaibBinAltaf*DavidA.Wood

UniversityofWisconsin-Madison

Everythingshouldbemadeassimpleaspossible,butnotsimpler—AlbertEinstein

*NowatAMDResearch,AustinTX

Page 2: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

ExecutiveSummary• Acceleratorsdonotalwaysperformasexpected• Crucialforprogrammersandarchitectstounderstandthefactorswhichaffectperformance

• Simpleanalyticalmodelsbeneficialearlyinthedesignstage• Ourproposal:LogCA– High-levelperformancemodel– Helpidentifydesignbottlenecksandpossibleoptimizations

• Validationacrossvarietyofon-chipandoff-chipaccelerators• Tworetrospectivecasestudiesdemonstratetheusefulnessofthemodel

2

Page 3: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Outline

• Motivation• LogCA• Results• Conclusion

3

Page 4: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

WhyNeedaModel?

4

“An accelerator is a separate architectural substructure ... that is architected using adifferent set of objectives than the base processor, ...., the accelerator is tuned toprovide HIGHER PERFORMANCE ….. than with the general-purpose base hardware”

S.PatelandW.Hwu.AcceleratorsArchitectures.Micro2008

M7:NextGenerationSPARCHotchips-262014 Power8Hpctchips-252013

Page 5: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

WhyaModel?

5

0.001

0.01

0.1

1

10

Time(m

s)

BlockSize(Bytes)

Host

Accelerator

EncryptionalgorithmonUltraSPARC T2

Break-evenpoint

AcceleratoroutperformsHostoutperforms

Amdahl’sLawforAccelerators

Better

Page 6: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

WhyaModel?

6

0.1

1

10

100

Speedu

p

OffloadedData(Bytes)

UltraSPARCT2

SPARCT4

GPUBreak-evenpoints

AdvancedEncryptionStandard(AES)

Runningthesamekernel,acceleratorscanhavedifferentbreak-evenpoints

Better

Page 7: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Outline

• Motivation• LogCA• Results• Conclusion

7

Page 8: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

ThePerformanceModel

• InspiredbyLogP [CACM1996]

• Abstractacceleratorusingfiveparameters– L Latency:Cyclestomovedata– o Overhead:Setupcost– g Granularity:Sizeoftheoff-loadeddata– C Computationalindex:Amountofworkdoneperbyteofdata– A Acceleration:Speedupignoringoverheads

• Sixthparameter𝜷generalizestokernelswithnon-linearcomplexity

8

Host Accelerator

Interface

Page 9: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

ThePerformanceModel

• Executionw/oanaccelerator– T0(g)=C0 (g)

• Executionwithoneaccelerator– T1 (g)=o1 (g)+L1(g)+C1(g)

9

T0(g)C0(g)

timeo1(g)L1(g)

C1(g)=#$(&)(

T1(g)

Host Accelerator

Interface

Gain

Page 10: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Granularityindependentlatency• Capturestheeffectofgranularityonspeedup• Speedupboundedbyacceleration– lim&→-

𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝑔 = 𝐴

• Overheadsdominateatsmallergranularities

– 𝑆𝑝𝑒𝑒𝑑𝑢𝑝(𝑔)&67 =#

89:9;<< #

89:

10

0.1

1

10

Spe

edup

(g)

Granularity (Bytes)

A

Amdahl’s law for Accelerators

Page 11: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

PerformanceMetrics• Rightamountofoff-loadeddata?• Inspiredfromvectormachinemetrics𝑁?,𝑁@

A

• 𝑔7:Granularityforaspeedupof1– 𝑔7 isessentiallyindependentofacceleration– Identifycomplexityoftheinterface

• 𝑔<A:Granularityforaspeedupof(

B

– IncreasingAalsoincreases𝑔<A

11

0.1

1

10

100

Spe

edup

Granularity (Bytes)

A

𝑔7 𝑔(B

SimpleInterface ComplexInterface

𝒈𝟏 𝒈𝟏LargeSmall

Page 12: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Granularitydependentlatency• SpeedupboundedbycomputationalintensityC/L

– lim&→-

𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝑔 < #: (𝑙𝑖𝑛𝑒𝑎𝑟𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠)

• Speedupforsub-linearalgorithmsasymptoticallydecreaseswiththeincreaseingranularity

12

0.1

1

10

Spe

edup

(g)

Granularity (Bytes)

A

𝐶𝐿

0.1

1

10

Spe

edup

(g)

Granularity (Bytes)

A

g Speedup

Sub-linearly

Linearly

Page 13: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Granularitydependentlatency• Computationalintensitymustbegreaterthan1 toachieveanyspeedup

• ComputationalintensityshouldbegreaterthanpeakperformancetoachieveA/2

13S

peed

up

Granularity (Bytes)

𝑔A/2

A/2

1

𝑔7

𝐶𝐿 ≥ 1

A

𝐶𝐿 ≥ 𝐴

Performancemetricshelpprogrammersearlyinthedesigncycle

Page 14: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

BottleneckAnalysisusingLogCA

14

0.1

1

10

100

1000

Spe

edup

Granularity (Bytes)

LogCA

L_0.1x

o_0.1x

C_10x

A_10x

• 10Xchangeinparameterè 20%performancegain• Helpsfocusonperformancebottlenecks

AoC

A

oC oCA A

𝐶 𝐿⁄

oC oCA A

Page 15: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Outline

• Motivation• LogCA• Results• Conclusion

15

Page 16: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

ExperimentalMethodology

• Fixed-functionandgeneral-purposeaccelerators– CryptographicacceleratorsonSPARCarchitectures– DiscreteandintegratedGPUs

• Kernelswithvaryingcomplexities– Encryption,Hashing,MatrixMultiplication,FFT,Search,RadixSort

• Retrospectivecasestudies– CryptographicinterfaceinSPARCarchitectures– MemoryinterfaceinGPUs

16

Page 17: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

CaseStudyICryptographicInterfaceintheSPARCArchitecture

17

PCIe CryptoAccelerator UltraSPARC T2

SPARCT3 SPARCT4engine

SPARCT4instructions

Page 18: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Conclusion

• Simplemodelseffectiveinpredictingperformanceofaccelerators• Proposedahigh-levelperformancemodelforhardwareaccelerators• Thesemodelshelpprogrammersandarchitectsvisuallyidentifybottlenecksandsuggestoptimizations

• Performancemetricsforprogrammersindecidingtherightamountofoffloadeddata

• Limitationsincludeinabilitytomodelresourcecontention,caches,andirregularmemoryaccesspatterns

18

Page 19: LogCA: A High-Level Performance Model for …...LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

Questions?

19Source:http://www.medarcade.com/