38
Cut Power Consumption by 5x Without Losing Performance A big.LITTLE Software Strategy Klaas van Gend FAE, Trainer & Consultant

Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

Cut Power Consumption by 5x

Without Losing Performance

A big.LITTLE Software Strategy

Klaas van Gend

FAE, Trainer & Consultant

Page 2: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

2 | October 10, 2014 LINUXCON EUROPE 2014

The mandatory Klaas-in-a-Plane picture

Page 3: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

3 | October 10, 2014 LINUXCON EUROPE 2014

Quad Core vs. Dual Core –Why isn’t it Twice as Fast?

VS

Page 4: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

4 | October 10, 2014 LINUXCON EUROPE 2014

The GHz race

Page 5: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

5 | October 10, 2014 LINUXCON EUROPE 2014

Why GHz++ cost power^2

Page 6: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

ARM big.LITTLE

“OK, heavy work costs power.

Let’s not waste power on light work…”

Page 7: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

7 | October 10, 2014 LINUXCON EUROPE 2014

ARM playing it cool: big.LITTLE

Source: http://community.arm.com/groups/processors/blog/2013/06/18/ten-things-to-know-about-biglittle/

Page 8: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

8 | October 10, 2014 LINUXCON EUROPE 2014

A7 vs A15

Cortex A7:

• Less silicon area

• Less optimal cycles

• Less cycles/second

• More power efficient

Page 9: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

9 | October 10, 2014 LINUXCON EUROPE 2014

How to use big.LITTLE today

Source: http://community.arm.com/groups/processors/blog/2013/06/18/ten-things-to-know-about-biglittle/

Page 10: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

10 | October 10, 2014 LINUXCON EUROPE 2014

Some available big.LITTLE hardware

• AllWinner A80

• Renesas automotive silicon

• Samsung Galaxy S4 for South-Korean market

• Samsung Galaxy S5 for South-Korean market

• Hardkernel ODROID-XU boardExynos5

Built-in Power

Measurement

Page 11: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

Use Case:

Chromium

Page 12: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

12 | October 10, 2014 LINUXCON EUROPE 2014

Chrome / Chromium / ChromeShell

• Chromium: open source browser

based on KHTML Webkit Blink

• Google Chrome: closed-source browser

based on Chromium

• ChromeShell: open source Chromium “browser” for Android

• Chrome for Android:closed-source browser for Android

Page 13: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

13 | October 10, 2014 LINUXCON EUROPE 2014

Chromium workloadVisualized

Loading

ParsingLayouting/Rendering

PaintingJavaScript

Canvas

Page 14: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

14 | October 10, 2014 LINUXCON EUROPE 2014

HTML5 Canvas“graphics device for JavaScript”

Page 15: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

15 | October 10, 2014 LINUXCON EUROPE 2014

HTML5 Canvas“graphics device for JavaScript”

Page 16: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

16 | October 10, 2014 LINUXCON EUROPE 2014

Parallelizing Canvas“not as easy as it looks”

Page 17: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

17 | October 10, 2014 LINUXCON EUROPE 2014

Canvas Parallelization -Performance Results

Benchmark

on quad-core

Standard

Blink

Parallelized

Blink

Performance

improvement

Flashcanvas perf 1,69 score 2,44 score 44%

Fc perf w/ alpha 1,04 score 1,52 score 50%

Guimark2 Vector 9,5 fps 13,3 fps 40%

Canvasmark ‘13 3475 score 4116 score 53%

Average improvement 47%

With parallelism you can improve performance of

even the most complex applications!

Page 18: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

18 | October 10, 2014 LINUXCON EUROPE 2014

Google Chrome

Page 19: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

19 | October 10, 2014 LINUXCON EUROPE 2014

Google Chrome on Odroid-XU+E

Using Google’s Chome (version 33 for Android)

• 2 cores active: 54% and 84%

• Power use A15+A7 cores: 2.374 Watts

• Test average: 9.44 fps

Page 20: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

20 | October 10, 2014 LINUXCON EUROPE 2014

Our ChromeShell on Odroid-XU+E

Using our optimized ChromeShell:

• 3 A15 cores active: 59%, 63% and 38%

• Power use A15+A7 cores: 3.116 Watts

• Test average: around 14 fps

Page 21: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

21 | October 10, 2014 LINUXCON EUROPE 2014

Canvas Parallelization -works even on ‘normal’ silicon like Qualcomm Snapdragon 800

• LG’s NEXUS 5 phone

• Quad core Qualcomm Snapdragon 800

• Phone heating up similarly in both cases

Default Chrome

Average: 7.12 fps

“Our” ChromeShell:

Average: 14.48 fps

Page 22: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

22 | October 10, 2014 LINUXCON EUROPE 2014

Canvas Parallelization -Power Consumption on “Flashcanvas perf”

Benchmark Standard Blink

on A15+GPU

Parallelized Blink

on quad-A7

Difference

No optimization 29 fps 17 fps -40%

Performance 29 fps 26 fps -10%

Power

consumption

2,2W 0,4W 550%

Performance /

Watt

1,3 65 490%

With parallelism and right chip choices

• you can get 5x power savings

• without losing performance!

Page 23: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

23 | October 10, 2014 LINUXCON EUROPE 2014

Comparing performance / watt:

Using Google’s Chome (version 33 Android)

• 2 cores active: 54% and 84%

• Power use A15+A7 cores: 2.374 Watts

• Test average: 9.44 fps

Using our optimized ChromeShell:

• 3 A7 cores active: 73%, 80% and 44%

• Power use A15+A7 cores: 0.472 Watts

• Test average: 10.04 fps

Page 24: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

24 | October 10, 2014 LINUXCON EUROPE 2014

1x A15 or 4x A7?

Page 25: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

25 | October 10, 2014 LINUXCON EUROPE 2014

1x A15 < 4x A7 !

20000

MIPS

More than twice the performance

Less W

Page 26: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

Back to big.LITTLE

Making these results work outside a lab

Page 27: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

27 | October 10, 2014 LINUXCON EUROPE 2014

State of big.LITTLE in Linux - 1What’s in the kernel today?

Page 28: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

28 | October 10, 2014 LINUXCON EUROPE 2014

State of big.LITTLE in Linux - 2What else is relevant?

• IKS (In-kernel-Switcher)– Firstly available in Linaro kernel trees

– Merged in 3.11 kernel

• Qualcomm / LG / etc powerdaemons– Throttle performance if cores overheat

– Usually “secret”

• Not-in-mainline Schedulers:– Linaro’s GTS (Global Task Scheduler),

– a.k.a. HMP (Heterogeneous Multi-Processing)

• Kernel Summit 2014 “Energy-Aware Scheduling Workship”

Page 29: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

29 | October 10, 2014 LINUXCON EUROPE 2014

Feedback loop

• We know when we want to have 4xA7 or 1xA15

• If we can tell the kernel, it can anticipate– instead of noticing an increase in workload

– and by accident turning on the A15s

Setpoint

Page 30: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

30 | October 10, 2014 LINUXCON EUROPE 2014

Where to go?

• Qualcomm MARE– Research project

– Framework to aid parallelization

– Should assist kernel in scheduling/cpufreq

• Deadline scheduler– Merged in Linux 3.14

– Application sets SCHED_DEADLINE

– Application sets scheduling attributes

– Task repetition in microseconds

– Task start within repetition

– Task completion deadline within repetition

“Feedback loop”

(in user space)

“Feedback loop”

(in kernel space)

Page 31: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

31 | October 10, 2014 LINUXCON EUROPE 2014

Is parallelism going to stay?Actually, is big.LITTLE going to stay???

• The GHz race has come to an end– Now also for ARM

• The speed of light limits “clock domain size”

• Thus many clock islands on a die– Multicore is just an “easy” way to improve performance

– At the cost of the programmer

– Who needs extra training

• ARM big.LITTLE– Is a mechanism to skip heavy power consumption

– At the cost of more mm2 silicon

– Is it worth it???

Page 32: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

32 | October 10, 2014 LINUXCON EUROPE 2014

My ideal ARM-based design:

big: 1x A57

LITTLE: 4x A53

Why is no-one designing this chip?

Page 33: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

Conclusions

Page 34: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

34 | October 10, 2014 LINUXCON EUROPE 2014

Conclusion

• big.LITTLE works

• IFF– Short bursts can be handled by one ‘big’ core

– Heavier workloads are parallelizable

and run on clusters of LITTLEs

– APIs become available:

Programs must indicate what the workload will be

BTW: Chromium is parallelizable – we did it.

Page 35: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

35 | October 10, 2014 LINUXCON EUROPE 2014

Conclusion

• big.LITTLE works

• IFF– Short bursts can be handled by one ‘big’ core

– Heavier workloads are parallelizable

and run on clusters of LITTLEs

– APIs become available:

Programs must indicate what the workload will be

BTW: Chromium is parallelizable – we did it.

Page 36: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

36 | October 10, 2014 LINUXCON EUROPE 2014

Vector Fabrics – the Company

• Founded February 2007

• Founding team– Strong in SoC design and multi-core software

– Currently 15 FTE: 6 PhD, 7 MSc

• Protected technology– 3 patents filed in US & Europe

• Recognition– “Hot Startup” in EE Times Silicon 60, since 2011

– Selected by Gartner as “Cool vendor in Embedded Systems & Software” 2013

– Global Semiconductors Alliance award, March 2013

Page 37: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

37 | October 10, 2014 LINUXCON EUROPE 2014

Contact Information

• Web:

www.vectorfabrics.com

• Email:

[email protected]

• Tel:

+31 40 8200960

• Address:

Vector Fabrics B.V.

Vonderweg 22

5616RM Eindhoven

The Netherlands

Page 38: Cut Power Consumption by 5x Without Losing Performance · Power Consumption on “Flashcanvas perf” Benchmark Standard Blink on A15+GPU Parallelized Blink on quad-A7 Difference

Thank You!

(drop your business card if you want the slides and the to-be-released whitepaper)

Klaas van Gend

FAE, Trainer & Consultant

[email protected]