25
Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures by Ernily Blern, laikrishnan Menon, and Karthikeyan Sankaralingarn Danilo Dominguez Perez [email protected] December 8, 2016 Iowa State University 1

Power Struggles: Revisiting the RISC vs. CISC Debate on …class.ece.iastate.edu/.../cpre581/Fall2016/DominguezPer… ·  · 2016-12-06This Study Revisit the RISC vs. CISC debate

  • Upload
    lephuc

  • View
    232

  • Download
    4

Embed Size (px)

Citation preview

Power Struggles: Revisiting the RISC vs.CISC Debate on Contemporary ARM and x86Architectures

by Ernily Blern, laikrishnan Menon, and KarthikeyanSankaralingarn

Danilo Dominguez [email protected]

December 8, 2016

Iowa State University

1

Overview

• Presentation of the Paper• Background• Infrastructure• Methodology• Results• Conclusion

• Analysis of the Paper

2

Background

Motivation

• Today’s computing landscape is starting to be shaped bysmartphones and tablets

• Previously, RISC architectures governed over smalldevices whereas CISC architectures ruled over serversand desktops

• Nowadays, both architectures are ”trespassing” each otherareas

3

Previous Work

• Previous studies compare performance of RISC vs. CISCISAs finding simple RISC instructions had betterperformance (Bhandarkar & Clark, 1991)

• Later studies conclude both ISAs have similar performanceif they apply aggressive ILP techniques (Isen, John, &John, 2009)

• Informal studies have suggest the power overheads ofCISC ISAs are intractable

4

This Study

• Revisit the RISC vs. CISC debate from a power andenergy perspective

• Previous comparisons focused on performance

• Show what is the role of the ISA and make a case forrevisiting the RISC vs. CISC question

5

RISC vs. CISC ISA

RISC CISC

Format Fixed length instructions Variable length instructions

Operations Single cycle operations Multi-cycle operations

Operands Few addressing modes Many addressing modes

* CISC instructions split into RISC-like micro-ops andmodern compilers pick mostly RISC-like instructions

6

Infrastructure

Infrastructure

• Find implementations with similar microarchitecture andI/O and memory subsystems

• Use workloads for servers, desktops and mobile

• Linux 2.6 LTS kernel

• gcc 4.4 target independent with optimizations enabled (O3)

• perf tool used for performance measurements

• Watssup meter connected directed to the board for powermeasurements

• For instruction mix use DynamoRIO for x86 ISA and gem5for ARM ISA

7

Approach

Overall Approach

8

Platforms

Platform Summary

9

Benchmarks

10

Limitations

• Core: No platform uniformity across ISAs

• Tools: use gcc uniform optimizations. Specific architectureoptimizations less than 10% effect

11

Methodology

Methodology: Performance Analysis

Mea-sure

Use perf tool to measureexecution time on benchmarks

Nor-malize

Normalize frequency’s im-pact using cycle count

CycleCounts

Present dynamic instructioncount measures to understand

differences in cycle count

Not ISAUse microarchitectural eventsto find performance expenses

sources different to ISA

12

Methodology: Power and Energy Analysis

Mea-sure

Present raw powermeasurements

Nor-malize

Factor out technology impactby scaling to 45nm and nor-malizing frequency to 1 GHz

Perfor-mance

Use raw energy to findinterplay between per-formance and power

ISAUse qualitative analysis tofind ISA power’s influence

13

Measured Data Analysis andFindings

Results: Cycle Count

• Performance gaps of up to1.5x from A8 to Atom

• Performance gaps of up to2.5x from A9 to i7

Cycle Count Normalized to i7

14

Results: Instruction Count

• Instruction count similaracross ISAs

• gcc tends to pick similarRISC-like instructions

• CPI was less in x86architectures: geometricmean of 3.4 for A8, 2.2 forA9, 2.1 for Atom and 0.7for i7

Instruction Count Normalized toi7

15

Results: Instruction Format and Mix

• Similar code densities inboth architectures

• Fraction of loads andstores similar across ISAfor all suites

• Large instruction counts forARM are due to absenceof FP instructions likefsincon, fy12xpl

• ISA effects areindistinguishable betweenx86 and ARMimplementations

Instruction Mix

SelectedInstruction Counts

16

Results: Microarchitecture

• Microarchitecture has the most impact on performance

• The large cache, accurate branch prediction and biggerissue width on OoO processors (A9 and i7) help x86 tohave better performance

• ISA has not considerable effect on performance

17

Results: Power and Energy

• With scaled frequency andtechnology, ISA isirrelevant

• i7 processors are designedto have high performancebut they are notpower-optimized

• Atom, on the other hand,have power consumptionon par with ARM A8 andA9

Tech. Independent Avg. PowerNormalized to A8

RawAverage Energy Normalized to

A8

18

Conclusions

• x86 CPI < ARM CPI

• ISA performance effects indistinguishable between x86and ARM

• Beyond micro-op translation, x86 ISA introduces nooverheads over ARM ISA

• Choice of performance optimizations impacts powerconsumption more than ISA

• ISA’s impact on energy is insignificant

19

Analysis of the Paper

• The paper was introduced as an analysis on the effect ofISAs on power and energy but end up presenting moredata on performance

• A more interesting study would be to find whatoptimizations for performance make i7 consume morepower

• Also, what factors make Atom to have performance on parof ARM A9 and be power-friendly

20

Bibliography i

References

Bhandarkar, D., & Clark, D. W. (1991). Performance fromarchitecture: comparing a risc and a cisc with similarhardware organization. In Acm sigarch computerarchitecture news (Vol. 19, pp. 310–319).

Isen, C., John, L. K., & John, E. (2009). A tale of twoprocessors: Revisiting the risc-cisc debate. In Specbenchmark workshop (pp. 57–76).

21