Video CodingVideo Coding
TSBK01 Image Coding and Data Compression
Lecture 10
Jörgen Ahlberg
OutlineOutline
I. Colour coding
II. Moving images: From 2D to 3D?
III. Hybrid coding
IV. Video coding standards
Part I:Part I:Colour CodingColour Coding
The base colours of colour television are
– Red: 700 nm
– Green: 546 nm
– Blue: 435 nm
Three base colours enough tosynthesize any visible colour!
B
G
R
The Colour VectorThe Colour Vector
In this plane, theluminance Y = R+G+B = 1
The PAL coloursThe PAL colours
Y = 0.30B + 0.59G + 0.11B
Cr = 0.70R - 0.59G - 0.11B
Cb = - 0.30R - 0.59G + 0.89B
Y luminance; Cr, Cb chrominance
Matrix
R
G
B
Y
R-Y
B-Y
Change basis to YUV (almost the same as YCrCb).
– For more info on color spaces, see colour FAQ at www.poynton.com/Poynton-color.html
The Human Visual System perceives the luminance in higher resolution than the chrominance!
Subsample the colour components.
Digital Colour CodingDigital Colour Coding
YU V
4:2:0
Y U V
4:2:2
Part II:Part II:Coding of Moving ImagesCoding of Moving Images
Principle I - Extend known methods to 3D
Coding MethodCoding Method Prestanda (bpp)Prestanda (bpp) ComplexityComplexity Decoding Decoding complexitycomplexity
PCM 6 – 8 Low Low
VQ 0.5 – 2 Very high Low
Predictive 2 – 5 Low Low
Transform 0.5 – 1.5 High High
Subband/Wavelet
0.1 – 1.0 High High
Fractal 0.1 - 0.5 Very high Low
Extending 2D MethodsExtending 2D Methods
Predictive coding
– 3D predictors
– Motion compensated predictors
Transform coding
– 3D transforms
Subband coding
– 3D subband filters
BUT! The properties of the image signal are different BUT! The properties of the image signal are different in the temporal and the spatial domain!in the temporal and the spatial domain!
Thus:Thus:
Principle II:
Hybrid methods
Hybrid predictive/transform coding popular++
Part III:Part III:Hybrid CodingHybrid Coding
Combine predictive coding and transform coding.
Use predictive coding to predict the next frame in the sequence.
Use transform coding to code the prediction error.
Transform CodingTransform Coding
T Q VLC
T: TransformQ: QuantizerVLC: Variable Length Coder
Predictive CodingPredictive Coding
Q
Q-1
VLC
P
Q: QuantizerQ-1: Inverse quantizer (reconstructor)P: Predictor
Hybrid CodingHybrid Coding
T
T-1
Q
Q-1
VLC
P
Frame PredictionFrame Prediction
Intra-codedI-frame
Predictivelycoded
P-frames
Better prediction if it can compensate for motion!
Motion CompensationMotion Compensation
Motion Compensated Motion Compensated Hybrid CodingHybrid Coding
VLCME
ME: Motion estimation
TQ-1
TQ
P
VLC
TQ: Transform+ quantization
Motion CompensationMotion Compensation
Typically one motion vector per macroblock (4 transform blocks)
Motion estimation is a time consuming process
– Hierarchical motion estimation
– Maximum length of motion vectors
– Clever search strategies
Motion vector accuracy:
– Integer, half or quarter pixel
– Bilinear interpolation
Part IV:Part IV:Video Coding StandardsVideo Coding Standards
8 16 64 384 1.5 5 20
kbit/s Mbit/s
Very low bitrate Low bitrate Medium bitrate High bitrate
Mobilevideophone
Videophoneover PSTN
ISDNvideophone
Digital TV HDTVVideo CD
MPEG-4 MPEG-1 MPEG-2H.261H.263
StandardsStandards
H.26x
– Standards for real time communication like video telephony and video conferencing.
– Standardized by ITU.
MPEG
– Standards for stored video data like movies on CDs, DVDs, etc.
– Standardized by ISO.
H.261H.261 Standard for ISDN picture phones in 1990.
Motion compensation:
– One motion vector per macroblock.
– One macroblock = four 8£8 luminance blocks + two chrominance blocks (one U and one V).
– Motion vectors max 15 pixels long in each direction.
Format:
– CIF (352£288) or QCIF (176£144)
– 7.5 – 30 frames/s.
Bitrate: Multiple of 64 kbit/s (=ISDN) including audio.
Quality: Acceptable for small motion at 128 kbit/s.
H.263H.263
Standard for picture telephones over analog subscriber lines in 1995.
Format:
– CIF, QCIF or Sub-QCIF.
– Usually less than 10 frames/s.
Bitrate: Typically 20 – 30 kbit/s.
Quality: With new options as good as H.261 (at half the bitrate).
MPEGMPEG
Moving Pictures Expert Group – a committee under ISO and IEC.
Original plan:
– MPEG-1 for 1.5 Mbit/s (VideoCD)
– MPEG-2 for 10 Mbit/s (Digital TV)
– MPEG-3 for 40 Mbit/s (HDTV)
What happened:
– MPEG-1 for 1.5 Mbit/s (Video CD)
– MPEG-2 for 2 – 60 Mbit/s (TV and HDTV)
– MPEG-4, -7 and -21 for other things.
MPEG-1MPEG-1
ISO/IEC standard in 1991.
Target bitrate around 1.5 Mbit/s (Video CD).
Properties:
– Bi-directionally predictively coded frames (”B-frames”, see next slide).
– More flexible than H.261.
– Almost JPEG for intra frames.
Format:
– CIF
– No interlace.
– 24 – 30 frames/s.
MPEG Frame TypesMPEG Frame Types
I B PB B PB B PB B IB
Intra-codedI-frame
Predictivelycoded
P-frames
Bi-directionallypredictively
codedB-framesGroup of frames (GOF)
MPEG-coding of I-framesMPEG-coding of I-frames
Intracoded
8£8 DCT
Arbitrary weighting matrix for coefficients
Predictive coding of DC-coefficients
Uniform quantization
Zig-zag, run-level, entropy coding
MPEG-coding of P-framesMPEG-coding of P-frames
Motion compensated prediction from I- or P-frame.
Half-pixel accuracy of motion vectors, bilinear interpolation.
Predictive coding of motion vectors.
Prediction error coded as I-frame.
MPEG-coding of B-framesMPEG-coding of B-frames
Motion compensated prediction from two consecutive I- or P-frames.
– Forward prediction only (1 vector/macroblock).
– Backward prediction only (1 vector/macroblock).
– Average of fwd and bwd (2 vectors/macroblock).
Otherwise as P-frames.
MPEG-2MPEG-2 ISO/IEC standard in 1994.
Properties:
– Handles interlace (optimized for TV)
– Even more flexible than MPEG-1
Format:
– 352£288
– 704£576 (25 frames/s) or 720£480 (30 frames/s)
– 1440£1152 or 1920£1080 (HDTV)
Bitrate:
– 2 – 60 Mbit/s
– ~4 Mbits/s: Image quality similar to PAL / NTSC / SECAM.
– 18 – 20 Mbit/s: HDTV.
MPEG-2 (cont.)MPEG-2 (cont.)
Profiles:
– Simple profile without B-frames.
– Scaleable profiles.
Experience tells that:
– At 1.5 – 2 Mbit/s MPEG-2 is not better than MPEG-1.
– With manual interaction at the coding, good quality can be achieved at 3 – 4 Mbit/s.
– Problems with implementing the full standard has caused compatibility problems.
– Buffering and rate control hard problems.
MPEG-4MPEG-4 ISO/IEC standard in 1998, version 2 in 1999
Instead of frames as coding units, MPEG-4 use audio-visual objects
Focus is not primarily on compression, but on content-based functionality
Contains definitions of:
– Media object types (video, audio, text, graphics, ...)
– Parameters for describing the objects
– Bitstream syntax for the (compressed) parameters
– Scene description, file format, streaming, synchronization, ...
Allows mixing of media objects.
Parts of the MPEG-4 Parts of the MPEG-4 standardstandard
Part 1, Systems, contains
– The bitstream syntax and the the binary ”language” for scene description
– Computer graphics object descriptions
– Multiplexing, transport, ...
Part 2, Visual, contains
– Video coding
– Still image coding
– Texture coding, ...
Part 3, Audio, contains a toolbox of audio coders for different applications
...
Structure of an MPEG-4 Structure of an MPEG-4 DecoderDecoder
A/Vobject
Decoder
MUX
Com
posito
r
Bitstream Audio/Video scene
A/Vobject
Decoder
A/Vobject
Decoder
A video frame
Background VOP
VOP
VOP
MPEG-4 (Natural) VideoMPEG-4 (Natural) Video
Instead of frames: Video Object Planes
Coded with Shape Adaptive DCT
Alpha map
SA DCT
TQ: Transform+ quantization
TQ-1
TQ VLC
Predictor
MPEG-4 Video CodingMPEG-4 Video Coding
Motionestimation
Mux
VLC
VLCShapecoding
Synthetic/Natural Synthetic/Natural Hybrid CodingHybrid Coding
Mix traditional video with 2D/3D graphics
– Compose virtual environments
– Easy to add text, graphs, images, etc
High compression
Receive object from separate sources
– Use predefined or locally defined objects
Scaleability
– Progressive decoding
– Better terminal gives better quality.
Synthetic ObjectsSynthetic Objects
2D/3D graphics
– Lines, polygons
– Still images
– Image/video mapping on polygon meshes
VRML scenes and objects
Animated people
More on animation and virtual characters in Lecture 12!
Synthetic audio
More on natural and synthetic audio in Lecture 11!
Computer graphics generatedvirtual environment
Natural video object
Natural video objectmapped on 2D mesh
Still image or natural video objectmapped on animated 3D mesh
All mixed inthe decoder!!!
Virtual EnvironmentsVirtual Environments
Downloaded virtual environment
Different environments for different users
Simple change between environments
Synthetic environments are cheaper than real ones
Tools for Synthetic ObjectsTools for Synthetic Objects
Wavelet-based still image compression
– Scaleable quality and resolution
– Progressive decoding
– Can be mapped on 2D or 3D meshes
Compression of 2D and 3D meshes
– Mesh geometry and animation
– Transmit vertex coordinates and let the receiving terminal calculate the polygons
– A moving or still image can be mapped on the mesh (texture mapping).
More Tools for Synthetic More Tools for Synthetic ObjectsObjects
Face and Body Animation
Text-to-speech (TTS) interface
View-dependent scaleable texture
– Information about the users view position in a 3D scene is transmitted on a back-channel
– Only the necessary texture information is transmitted to the user
View-dependent Scaleable View-dependent Scaleable TextureTexture
Original texture
The texture is mapped on a surface
What the user sees
Other formatsOther formats
Microsoft, RealVideo, QuickTime, ...
All are variations of the hybrid coder used in MPEG-coders, with some extra features.
New StuffNew Stuff
ITU and ISO in cooperation:
H.264H.264==
MPEG-4 part 10MPEG-4 part 10
Finished in 2003.
H.264 / MPEG-4 part 10H.264 / MPEG-4 part 10
4£4 integer transform (approximating DCT).
Prediction of blocks of sizes up to 16£16.
Motion vectors for blocks of sizes 4£4 up to 16£16.
Up to 5 reference images for prediction.
Non-uniform qunatization.
Arithmetic coding of run-level pairs.
What about the sound?What about the sound?
MPEG-1
– Audio layer I, II and III (mp3).
MPEG-2
– Four channels, same codec as in MPEG-1.
– AAC (Advanced Audio Codec) added later.
MPEG-4
– AAC
– Two speech coders
– Structured audio
– And more...
More on audio codingin Lecture 11.
ConclusionConclusion
Color coding
– Change basis from RGB to YUV
– Colour components are compressed harder than the luminance
Moving image coding
– Hybrid coding: Motion compensated predictive coding and transform coding of the prediction error
– I-, P-, and B-frames
– Object-based coding (MPEG-4) mixing synthetic and natural audio & video
Conclusion (cont)Conclusion (cont)
Standards
– MPEG-1: Video CD
– MPEG-2: Digital TV
– MPEG-4: Multimedia
– H.261: ISDN videophone
– H.263: PSTN videophone
– H.264 / MPEG-4 part 10: Universal video
That was the last slide!