ESSCIRC 2002Firenze, September 26th
Authors:M. Mancuso, D. Alfonso, A. Artieri, A. Capra,F. Pappalardo, F. Rovati and R. Zafalon
STMicroelectronicss1-1
Ultra Low Power Multimedia Processor for Mobile Application
ESSCIRC 2002 - Firenze
Why designing low-power circuits/systems?
qPractical reasons:Ø Reducing power requirements of high-
throughput portable applications.
q Financial reasons:Ø Reducing packaging costs and achieving
energy savings.
q Technological reasons:Ø Enabling the realization of high-density chips
(heat poses serious limitations to circuit complexity and functionality).
ESSCIRC 2002 - Firenze
Why low-power (II)
qDriving forces:Ø Advent of deep sub-micron technologies.Ø Increasing market share of mobile applications.Ø Limitations of battery technology.
ESSCIRC 2002 - Firenze
Power Dissipation per Logic Function
100000
1000000
10000000
100000000
10000000001
98
6
19
90
19
94
19
98
20
02
20
06
Year
Tra
nsi
st./c
m2
incl
ud
ing
SR
AM
0.020.040.060.080.0
100.0120.0
Po
wer
sca
ling
%
(100
% a
t 5V
)
Moore's law Power Scaling
•Power per logic function scales down much slower than integration density’s growing
ESSCIRC 2002 - Firenze
Maximum Power Trend (Source: ITRS 2001)
708090
100110120130140150160170180
1999
2000
2001
2002
2003
2004
2005
2006
Year
Hig
h P
erf.
Des
kto
p
max
Po
wer
[W
]
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Po
rtab
le m
ax p
ow
er
[W]
Desktop Portable
•High performance Desktop vs. Portable units•Power will be limited more by system level cooling and test constraints than packaging
ESSCIRC 2002 - Firenze
… Don’t forget Battery Technology
q Battery maximum power and capacity increase:Ø 10-15% per year
q Chip power requirements increase much faster: Ø 35-40% year ( ITRS 2001 )
Consequence ….
q Larger gap between Ø battery technology capability
and …Ø chip power demand
ESSCIRC 2002 - Firenze
What’s Next?
q The CMOS technology evolution has provided a straightforward path to reduce the basic power consumption without remarkable design effortØ Moore’s law still keeps going (at least until 2010)Ø The tech. “Brute force” has been easier, faster
and more affordable for designersq Power is really dictating the limit to super-
integrationØ High performance uP, dissipation will exceed
package limit by 25X in 15 years (Source: ITRS Roadmap, Update 2001)
ESSCIRC 2002 - Firenze
The Algorithmic Driving ForceShannon asks for more than Moore can deliver...
1
10
100
1000
10000
100000
1000000
10000000
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Algorithmic Complexity(Shannon’s Law)
Processor Performance (Moore’s Law)
Battery Capacity1G
2G
3G
ESSCIRC 2002 - Firenze
Power Density is Close to …a Nuclear Reactor
q Need to design for high performance AND low power at all levels:Circuits to Micro−architecture and Software
Courtesy of Fred Pollack, IntelKeynote speech, MICRO-32
P4 @ 1.4GHz, 75W
ESSCIRC 2002 - Firenze
Low Power System Methodology should span the whole range
q Embedded processors Architecture and µ−Arch
q RT-OS Run Time Power Mngt & Dynamic Volt Scaling
q Network-Centric Power Management
Ø power is prime determined in Communication
Ø Battery management is key to key to extend battery life
q Memory hierarchy optimization/SW compilers
q Loss-less Code/Data Compression for Memory Access Energy Saving
ESSCIRC 2002 - Firenze
Opportunities for Power Reduction
System Level
Behavioral Level
RT Level
Gate / Logic Level
Device Level
Physical Level Week
Days
Day
Hours
Hour
Minutes
Runtime Requirements
10X
40-90%
30-50%
20-30%
10-20%
5-10%
Algorithms, HW/SW Tradeoffs,Avoid waste during SW compilation
Scheduling, Allocation, Resource Sharing & Retiming
Clock-Gating, Operand Isolation, Precomputation, & FSM EncodingVoltage scaling
Technology Mapping, Low Power Library,Rewiring, Phase Assignment, & De-GlitchingMinimize TR x CAPA
Optimize circui/layout:Buffering & Transistor Sizing,Clock Tree
Multiple/Optimum Vt, Triple well, SOI
ESSCIRC 2002 - Firenze
Market Trends for New Mobile Applications
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2002 2003 2004 2005 2006
Voice2G
Sources: Handsets: IDC (Q1/02) / PDA: Gartner Dataquest (Q3/01)
Data traffic exceeds voice traffic
Voice & Data(Camera, MP3 Player)
2.5G
Mobile Multimedia 3G
(Streaming, Videoconference,..)
Mobile internetGlobal convergence
ESSCIRC 2002 - Firenze
NEW MOBILE MULTIMEDIA SERVICES REQUIRE CONVERGENCE
• Low Power• Real TimeCommunication
• Interoperability
• Real Time Audio/Video
• CMOS Sensor & Imaging
• SW Middleware• Low Cost
• Graphics• Internet Access• OS• Storage
ESSCIRC 2002 - Firenze
Application Environment Definition
Application is real-time Video Capture and encoding
Multimedia Processor
Multimedia Processor
Imager
ESSCIRC 2002 - Firenze
System Level Power Reduction: Examples
Bus Encoding
CMOSSensor
Scaling
DisplayProcess.
Display dependentprocessing ...
AudioInterface
DDX, IntelliMic ...
Embedded Mem
Image InputPipeline
CommunicationPeriph.
BluetoothIrDAUSBGPS ...
Ext. MemIF
FlashSDRAM.
ApplicationSpecificFlash
Periph.
Smartcard
Security
uP
STD OS supportMiddleware, JAVAAPI for MMStreaming ...
MEDIAACCELLERATION
Motion EstimFrame Compress
ESSCIRC 2002 - Firenze
Image Acquisition System
ESSCIRC 2002 - Firenze
qMain features:
Øuse of the correlation between data to increase the coding
ØBus Inverter encoder
Ønarrow bus width to amplify the effects of the BI encoder
qResults:
Øup to 63% less of switching activity (compared with a normal 10 bits bus)
Bus Encoding
RGRGRGRGR
BGBGGRGRBGBGGRGR
Image Processing
Unit
Encoder
SensorRGRGRGRGR
BGBGGRGRBGBGGRGR
Image Processing
Unit
Decoder
Sensor
ESSCIRC 2002 - Firenze
Bus Encoding: Most Relevant Featuresq Tracing communication overheads between the
architectural modules: switching activity minimization
PBus = Σi ½ αi Ci V 2 fclk
q Low Power Encoding/Decoding with no speed degradation
q Dynamic Software profiling during executionq Evaluation metrics to characterize data streams q Identify the best encoding which fits the target
application
ESSCIRC 2002 - Firenze
Bus Encoding Technique Overview
qRedundant Encoding:Ø T0 CodeØ BusInvert Code
ØMemory-Less Adaptive Partial BusinverterCode
Ø Memory-Adaptive Partial Businverter CodeØ T0-Xor-OffSet Code
q Irredundant Encoding:Ø T0-Xor CodeØ OffSet CodeØ OffSet-Xor Code
ESSCIRC 2002 - Firenze
Bus Encoding: Software Execution Profiling
q Power Tracer Tool:ØTrace transition activity of system-level buses
during the execution of benchmark programs.
ØAnalyze bus traces in terms of evaluation metrics.
Ø Implement bus encoding techniques.
ESSCIRC 2002 - Firenze
161514131211109
87654321
Sensor,10 bits per pixel
2420231922182117
3228312730262925
84736251
161215111410139
Hamming distance
Bus, 5 bits per pixel + inverter line
=1 ,ˆ0 ,
)((t)b
b(t)tout
Decoder
Neighborhood
Levels
Classical approach
Proposed approach
Bus Encoding: Data Reordering
Local levels distribution at low frequencies
ESSCIRC 2002 - Firenze
16.95%7.70%raster mode
31.63%21.16%2 pixels
40.12%27.40%4 pixels
44.72%30.05%8 pixels
63.73%33.30%1 Line
BUS 5 BITSBUS 10 BITSBUFFER SIZE
SWITCHING ACTIVITY REDUCTION
Bus Encoding: Results
ESSCIRC 2002 - Firenze
SCALING: Where ? (1/2)
The quality of stills requires sensors with higher resolution than video.Consequently Sensor and IGP will work at maximum resolution (VGAfor.ex.) even if video will have lower resolution (QCIF, for.ex).Scaling algorithms can play a key role on Power Consumption.
VGA 15 F/sec
Col. Processing Scaling
VGA 15 F/sec
VGA (for.ex)
QCIF 15 F/sec
ESSCIRC 2002 - Firenze
SCALING: Where? (2/2)
Image Generation
Pipeline
Sensor
10 bits Scaling8 bits
BAND5.6 MB/sec
BAND13.2 MB/sec
8 bits
BAND1.1 MB/sec
VGABayer
VGARGB
QCIFRGB
Image Generation Unit
RGRGRGRGR
BGBGGRGRBGBGGRGR
Image Generation
Pipeline
Sensor
10 bits Scaling 10 bits
BAND5.6 MB/sec
BAND0.45 MB/sec
8 bits
BAND1.1 MB/sec
VGA 15fpsBayer
QCIF 15 fpsBayer
QCIF 15 fpsRGB
Image Generation Unit
RGRGRGRGR
BGBGGRGRBGBGGRGR
Un-optimized version
Optimized version
In the optimized version the IGP performs a reduced number of operations per second and the band is reduced
ESSCIRC 2002 - Firenze
SCALING: Results
Scaling a VGA Image with Bi-cubic Alg.
Scaling a VGA Image with PoliPhase Alg.
Scaling a Bayer Pattern Scaling a Bayer Pattern VGA with ST VGA with ST proprietary Algorithmproprietary Algorithm
ESSCIRC 2002 - Firenze
Motion Estimation
SLIMPEG MOTION VECTOR FIELD
BUILDING PROCESS
MOTION ESTIMATION VECTORS
MOTION COMPENSATED NOISE REDUCTION
SCENE CUT DETECTION
3:2 PULLDOWN DETECTION
CONCEALMENT MOTION VECTORS
INTERLACED / PROGRESSIVE DETECTION
ADAPTIVE SEARCH WINDOWS FOR UNCONSTRAINED SEARCH
Motion Estimation plays a critical role in Video Encoding.
ST Solution (SLIMPEG) offers the following advantages:
Low PowerPicture Quality &True MotionLow BandWidthLow Complexity Search Window Independency
Motion Estimation
Other
MIPS
Motion Estimation
Other
Mem. BW
ESSCIRC 2002 - Firenze
Motion Estimation: Complexity
q SLIMPEG, Three Step, Densely Centered Uniform P-Search [*] ,Fast Search (already included in the standard, based on a heuristic search) and Full Search algorithm.
q Figures are numer of matchings per QCIF frame; values take into account border effects
q SLIMPEG shows lowest and constant complexity: 99% gainvs Full search
Slimpeg 3 step hierarch. Fast Search D.C.U.P-S Full Search
Foreman 1,247 2,707 4,998 16,927 78,231
Coastguard 1,247 2,692 4,356 16,927 78,231
Miss America 1,247 2,655 2,665 16,927 78,231
B. Furth, J. Greenberg, R. Westwater, "Motion Estimations Algorithms for Video Compression",Kluwer Academic Publishers, 1997
ESSCIRC 2002 - Firenze
Motion Estimation: Stable Complexity
Low and fixed number of operations per macroblock
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
0 16 32 48 64 80 96 112 128
Full searchLogarithmicProposed
Matchings per macroblock vs. search window size
ESSCIRC 2002 - Firenze
Motion Estimation: Quality Achieved
-2.00
-1.50
-1.00
-0.50
0.00
0.50
1.00
1.50
2.00
carp
hone
child
ren
fore
man
mon
itor
mis
sa
mot
her
new
s
rena
ta
sile
nt
teen
y
Ave
rage
Y P
SNR
[dB
]
Gain over Full SearchGain over PMVFAST
QCIF 64 kbit/s, 15 fps
ESSCIRC 2002 - Firenze
Motion Estimation Quality Achieved (2)
Comparison between Full Search and proposed method against scene changes
30
31
32
33
34
35
36
1 51 101 151 201 251
Y P
SNR
[dB
]Full SearchProposed
ESSCIRC 2002 - Firenze
Q 1
+
P
x̂ n
x n
en̂
Coder C
x n Qin
+
Px̂ n
en
en̂
x n
-
Decoder D
Ggk
• Fixed compression ratio of 50%• No mismatch between encoder and decoder• Quality drop well masked by Mpeg quantization
noise
Frame Buffer Compression
ESSCIRC 2002 - Firenze
50% Bandwidth Saving
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
carp
hone
chil
dren
fore
man
mon
itor
mis
sa
mot
her
new
s
rena
ta
sile
nt
teen
y
Mea
n ba
ndw
idth
[MB
/s]
Full Search Bandwidth
Average bandwidth of the proposed solution versus the Full Search,without and with memory compression
ESSCIRC 2002 - Firenze
Minimal Quality Loss
Without compression With compression Difference
ESSCIRC 2002 - Firenze
VIDEO CODECSAn Optimized Implementation
qMix of Host SW, FW, HW;q Low operating frequency:Ø QCIF codec 15Hz : 3 MHzØ VGA encode 30Hz : 40 MHz
qUltra low power (0.13um,ULL,0.9v) :Ø QCIF 15Hz decode : <1mWØ QCIF 15Hz encode : 3 mWØ VGA 30Hz encode : 40 mW
Host
HW. Acc
AHB
AHB
SystemMemory
Sensor
IT
ESSCIRC 2002 - Firenze
Required MIPS on ARM
q Codec drivers is running on ARM CPU.q QCIF 15Hz Video Decoder
requirements:Ø Video IT routine: 0.06 MIPSØ Video Decode driver: 0.13 MIPSØ Video Display driver: 0.03 MIPS
q Only 0.2 MIPS required for video codec control (0.07% of ARM CPU)
q Note:Ø VAX MIPS equivalent to 1757 dhrystone/s, ARM9 @ 264Mhz:290MIPS
ESSCIRC 2002 - Firenze
Conclusions
q System Level Power Reduction examples have been presented in the context of Mobile Multimedia
q ST Bus Encoding of Bayer data combined with the optimized scaling allow to achieve a saving of more than 93% in data throughput
q ST Motion estimator solution achieves remarkable savings in terms of computation workload (99% less than Full Search), internal and external memory size, silicon area as well as bandwidth requirement on the system bus. Quality of results comparable to the common Full Search approach