39
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana- Champaign http://www.cs.uiuc.edu/grace [email protected]

Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign

Embed Size (px)

Citation preview

Adaptive Video Coding to Reduce Energy on General Purpose Processors

Daniel Grobe Sachs,Sarita Adve, Douglas L. JonesUniversity of Illinois at Urbana-Champaign

http://www.cs.uiuc.edu/grace

[email protected]

Introduction

Wireless multimedia increasingly common

Recent advances reduce constraints: 2GHz+ processors High-speed wireless networks

Systems now Energy limited Energy management essential

Adaptation

Adaptation key to energy management Hardware adaptation already common Software adaptation also possible

Challenges How do we control adaptations? How do we coordinate different

adaptations?

GRACE Project

Target mobile multimedia devices. Coordinated adaptation of all system layers

Hardware, application, network, OS Complete cross-layer adaptation framework

Preserves separation between layers

Goals of this work

Target wireless video transmission Adapt application: Adaptive video encoder Adapt hardware: Adaptive CPU

Implement part of GRACE framework Trade off between CPU and network energy

Contributions

Apply existing adaptive-CPU research Energy-adaptive video encoder

Trades off between network, CPU Allows adaptation with fixed QoS

Cross-layer adaptation framework Coordinate app and CPU adaptation Preserves logical separation between

layers 20% Energy savings over existing systems

Presentation Overview

System model System architecture and design Cross-layer adaptation process Results

System Model

Total Energy = CPU Energy + Network Energy

Adaptive CPU

AdaptiveVideo Encoder

Control

WirelessNetwork

•Video Capture

CPU Hardware Adaptation [Micro]

Reduce performance to save energy Voltage and frequency scaling

Lower freq lower voltage lower energy

Architecture adaptation Issue width Active functional units (ALUs, etc.) Instruction window size

Adaptive Encoder

Based on TMN H.263 encoder Changed to logarithmic motion search

Encoder adapts for energy Trade off between network and CPU

energy More computation fewer bits

Adapt Motion Search and DCT Computationally expensive Elimination affects primarily rate

Adaptive Encoder Details

Motion Search and DCT thresholds Terminate MS early when SAD under

threshold Skip DCT if SAD of block under threshold

Transmit “DCT flag” bit for each 8x8 block Extends H.263 standard

Adaptation effect: Setting thresholds at infinity

Reduces CPU load by ~50% Increases data rate by 2x or more

Adaptation Control

When do we adapt?

What configurations do we choose?

Adaptation Control

When do we adapt? Adapt before every frame

What configurations do we choose?

Adaptation Control

When do we adapt? Adapt before every frame

What configurations do we choose? Must minimize total CPU+network

energy Must complete frame within its allocated

time

Adaptation Control

When do we adapt? Adapt before every frame

What configurations do we choose? Must minimize total CPU+network

energy Must complete frame within its allocated

time How do we find the optimal

configurations?

Optimization

Application, CPU reconfiguration linked Application reconfiguration changes workload CPU reconfiguration changes performance App config affects optimal CPU configuration

… and vice versa Two stage approach

1. For each app config, find CPU config, energy2. Pick lowest-energy application configuration

Optimization Algorithm

1. For each app config, find Best CPU config

CPU energy

Network energy

Total energy = CPU energy + network energy

2. Pick app config with lowest total energy

Optimization Algorithm

1. For each app config, find Best CPU config

– completes in time, with least energy [MICRO’01]

CPU energy

Network energy

Total energy = CPU energy + network energy

2. Pick app config with lowest total energy

Optimization Algorithm

1. For each app config, find Best CPU config

– completes in time, with least energy [MICRO’01]

CPU energy

Network energy

Total energy = CPU energy + network energy

2. Pick app config with lowest total energy

Requires instruction count

Optimization Algorithm

1. For each app config, find Best CPU config

– completes in time, with least energy [MICRO’01]

CPU energy

= Instruction count x Energy per instruction [MICRO’01]

Network energy

Total energy = CPU energy + network energy

2. Pick app config with lowest total energy

Requires instruction count

Optimization Algorithm

1. For each app config, find Best CPU config

– completes in time, with least energy [MICRO’01]

CPU energy

= Instruction count x Energy per instruction [MICRO’01]

Network energy

= Byte count x Energy per byte [WaveLAN measured]

Total energy = CPU energy + network energy

2. Pick app config with lowest total energy

Requires instruction count

Optimization Algorithm

1. For each app config, find Best CPU config

– completes in time, with least energy [MICRO’01]

CPU energy

= Instruction count x Energy per instruction [MICRO’01]

Network energy

= Byte count x Energy per byte [WaveLAN measured]

Total energy = CPU energy + network energy

2. Pick app config with lowest total energy

Requires byte count

Requires instruction count

Adaptation Process: Stage 1

App. Conf. 1

CPU NetPredict Next Instr. Count

Predict Next Byte. Count

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

App configuration energy table

Adaptation Process: Stage 1

App. Conf. 1

CPU NetPredict Next Instr. Count

Predict Next Byte. Count

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

App configuration energy table

Find CPU ConfigurationCPU Optimizer

Adaptation Process: Stage 1

App. Conf. 1

CPU NetPredict Next Instr. Count

Predict Next Byte. Count

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

App configuration energy table

CPU EnergyEstimator

Predict CPU Energy

Predict Net Energy

Find CPU Configuration

Network EnergyEstimator

CPU Optimizer

Adaptation Process: Stage 1

App. Conf. 1

CPU NetPredict Next Instr. Count

Predict Next Byte. Count

+Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

App configuration energy table

CPU EnergyEstimator

Predict CPU Energy

Predict Net Energy

Find CPU Configuration

Network EnergyEstimator

CPU Optimizer

Adaptation Process: Stage 1

App. Conf. 1

CPU NetPredict Next Instr. Count

Predict Next Byte. Count

+Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

CPU EnergyEstimator

Predict CPU Energy

Predict Net Energy

Find CPU Configuration

Network EnergyEstimator

CPU Optimizer

Adaptation Process: Stage 1

App. Conf. 1

CPU NetPredict Next Instr. Count

Predict Next Byte. Count

+Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

CPU EnergyEstimator

Predict CPU Energy

Predict Net Energy

Find CPU Configuration

Network EnergyEstimator

CPU Optimizer

Adaptation Process: Stage 2

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

Adaptation Process: Stage 2

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

Pick Lowest Energy

Adaptation Process: Stage 2

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

Pick Lowest Energy

CPUAdaptor

Chosen Configuration

ApplicationAdaptor

Adaptation Process: Stage 2

Conf 1Energy

Conf 2Energy

Conf 3Energy

. . . Conf nEnergy

Pick Lowest Energy

CPUAdaptor

Chosen Configuration

ApplicationAdaptor

Capture, Encode, and Transmit Frame

Predictors

How do we predict instructions and bytes? Fixed software use previous frame data Adaptive software no longer works!

Solution: Offline profiling Encode reference sequences offline Transition randomly between app. configs Fit predictors to transitions between configs

Map last instruction, bytes to new app. config Linear, 1st-order predictors

Experiments

RSIM CPU simulator State-of-the-art CPU, memory Princeton Wattch energy model Reported energy typical of modern CPUs

Simulation Conditions: Fixed and adaptive CPU Fixed and adaptive software Foreman sequence

Fixed vs Adaptive Systems

Adaptive hardware saves 70% over fixed system Adaptive application saves

30% on fixed hardware 20% on adaptive hardware (total savings of 80%)

•0

•5

•10

•15

•20

•25

•30

•35•30.49

•21.23

•7.36 •6.25

Net

CPU

Adaptive H/W

Adaptive S/W

Adaptive Sys

Fixed System

•En

erg

y (

J)

Algorithm Comparison

Baseline: Fixed software, adaptive hardware Adaptive software:

Adaptive DCT/motion thresholds Instruction, byte count for next frame predicted

Oracle Instruction and byte count for next frame exact

Adapt-Once Adapt once at start of encoding Minimize total energy across entire sequence

•0

•2

•4

•6

•8 •7.36•6.55

•6.09•6.25

Algorithm Comparison•En

erg

y (

J) Net

CPU

Adapt Once

Fixed

Adaptive

Oracle

Energy consumption of Adaptive within 3% of Oracle Simple predictors sufficient for energy savings

Adaptive saves 5% over Adapt-Once Frame-by-frame adaptation can save energy

Other test cases

Low Power CPU Network energy dominated Software adaptation did not save energy

Carphone Little inter-frame variation One-shot adaptation was sufficient Adapt-Once, Adaptive, Oracle same

energy Adaptive software saved ~15%

Conclusions

A new framework for coordinated CPU/application adaptation Combined benefits of both adaptations Preserves separation between layers

Adaptive applications save energy: Up to 20% on adaptive hardware Up to 30% on fixed hardware