Upload
partho-choudhury
View
25
Download
2
Embed Size (px)
Citation preview
(12) United States Patent Ramasastry et a].
US007522774B2
US 7,522,774 B2 Apr. 21, 2009
(10) Patent N0.: (45) Date of Patent:
(54)
(75)
(73)
(21)
(22)
(65)
(60)
(51)
(52) (58)
METHODS AND APPARATUSES FOR COMPRESSING DIGITAL IMAGE DATA
Inventors: J ayaram Ramasastry, Woodinville, CA (US); Partho Choudhury, Maharashtra (IN); Ramesh Prasad, Maharashtra (IN)
Assignee: Sindhara Supermedia, Inc., Redmond, WA (US)
Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 607 days.
Appl. No.: 11/077,106
Filed: Mar. 9, 2005
Prior Publication Data
US 2005/0207664 A1 Sep. 22, 2005
Related US. Application Data
Provisional application No. 60/552,153, ?led on Mar. 10, 2004, provisional application No. 60/552,356, ?led on Mar. 10, 2004, provisional application No. 60/552,270, ?led on Mar. 10, 2004.
Int. Cl. G06K 9/36 (2006.01) US. Cl. .................................................... .. 382/232
Field of Classi?cation Search ............... .. 382/232,
382/239, 240, 248; 708/317, 4004401; 375/240, 375/240.01, 240.02, 240.11, 240.18, 240.19;
348/384.1, 398.1, 404.1 See application ?le for complete search history.
L I QLJ
Server /0 I
l i I Encode’ Acquisition l
Encoder ‘ Decoder
Network (9.9., wired and/or Wireless)
(56) References Cited
U.S. PATENT DOCUMENTS
5,585,852 A * 12/1996 Agarwal .............. .. 375/24011
5,881,176 A * 3/1999 Keith et a1. ............... .. 382/248
6,967,600 B2 * 11/2005 Kadono et a1. .............. .. 341/67
7,295,608 B2 * 11/2007 Reynolds et a1. ..... .. 375/24001
7,333,814 B2* 2/2008 Roberts ................. .. 455/4522
* cited by examiner
Primary Examinerilose L Couso (74) Attorney, Agent, or FirmiBlakely, Sokoloff, Taylor & Zafman LLP
(57) ABSTRACT
Methods and apparatuses for compressing digital image data are described herein. In one embodiment, a Wavelet transform is performed on each pixel of a frame to generate multiple Wavelet coef?cients representing each pixel in a frequency domain. The Wavelet coef?cients of a sub-band of the frame are iteratively encoded into a bit stream based on a target transmission rate, Where the sub-band of the frame is obtained from a parent sub-band of a previous iteration. The encoded Wavelet coef?cients satisfy a predetermined threshold based on a predetermined algorithm While the Wavelet coef?cients that do not satisfy the predetermined threshold are ignored in the respective iteration. Other methods and apparatuses are also described.
30 Claims, 18 Drawing Sheets
1 , Optional ‘ ) Decoder
Optional Encoder
__J
Decoder
- (if)
Client I o l)‘
US. Patent Apr. 21, 2009 Sheet 3 0f 18 US 7,522,774 B2
Physicat Layer (W~CUMA, CDMA 1 X, cdmaZOOO, GSNLGPRS, UMTS, iBe-n) (1)
Data Link Control (DLC) (2)
Streaming protocol stack (RTP. RTSP. RTCP, '\ \
DDP) (4) } Third party ISO protocol stack (TCP/lP/UDP) (3)
Billing and other ancillary services (5)
Network Aware Layer (NAL) (6)
Application Layer APIs for QwikStream m, Qwikvu1M and QwikTexW (7)
Content Generation Engine (8)
Data Repository (9)
Fig. 3
US. Patent Apr. 21, 2009 Sheet 4 0f 18 US 7,522,774 B2
Raw YUV color frame data
4 o r
1 t t ! Jo
Wavelet Transform filter bank
#02. r___________ Y
Source Encoder (ARIES)
,3 1 l 1 l t
rjr t i 5' Channel encoding (Tree partitioning, CRC,
t RCPC)
494 i l t r
\QL.
US. Patent Apr. 21, 2009 Sheet 5 0f 18
Compressed Image (,qvx ?le format) I
Channel decoding (Tree merging, CRC, RCPC)
Source Decoder (l-ARIES) 1
Inverse Wavelet Transform
1
l l I
Raw YUV data
Fig. 48
US 7,522,774 B2
US. Patent Apr. 21, 2009 Sheet 6 0f 18 US 7,522,774 B2
i 5190 l
i l l Perform a wavelet transformation on each image pixel to ’ transform the pixel into one or more coef?cients in one or
more wavelet maps. l l l l
l Encode each wavelet map by representing the signi?cance, sign and bit plane information of the pixel using a single bit
in a bit stream. In’ 3.0L
l Encode the signi?cant bits into a context variable '
dependent upon the information represented by the bit and its location of the coef?cient being coded (e.g., the l
l probability of occurrence of a predetermined set of bits [ immediately preceding the current bit). ,
l l
l l
l l
Transmit the content of the context variable as a bit stream as an output representing the encoded pixels.
1
US. Patent Apr. 21, 2009 Sheet 7 0f 18 US 7,522,774 B2
y.
//
Sub-tree 1 Sub'lree 2 Sub-tree 3
\ (HL) (LH) (HH)
Fig. 6
US. Patent Apr. 21, 2009 Sheet 8 0f 18 US 7,522,774 B2
Fig. 7
US. Patent Apr. 21, 2009 Sheet 9 0f 18 US 7,522,774 B2
l Determine a number of iterations (nl) based on a number of I
quantization levels, which may be determined on the I @/ largest wavelet coefficient, and set an initial quantization
threshold T = 2 “l h g a t
Populate all insigni?cant pixels in lPQ, all insigni?cant pixel having descendants in ISQ, and all signi?cant pixels in
SPQ.
l For each type i entry of lSQ, if the entry is signi?cant with respect to a current quantization threshold, remove the respective entry from ISO and append it in the SPQ
l For each type I entry of lSQ, if the entry is insigni?cant with respect to a current quantization threshold, remove the respective entry from lSQ and append it in the lPQ
l If the respective type t entry includes descendants, remove the entry from the lSQ and append it at the end of ISO as
type it entry for next iteration; otherwise, the entry is purged.
l For each type It entry of ISQ, if the entry is signi?cant with respect to a current quantization threshold, all offspring of the current lSQ entry are appended to the end of lSQ as
type I entries for next iteration. a» {a g
l Remove any entry in IPQ that is signi?cant with respect to the current quantization threshold and append it in the
“gal,
is m
dxfoq,
US. Patent Apr. 21, 2009 Sheet 10 0f 18 US 7,522,774 B2
Wavelet Transform filter bank
'BypassTAE/MC' t ___ fortframes
D U
Mmhk f I '7 [\ ME/MC' F, f
l/ e
4”’ t t .
t motion if, 2 _/
P_—_2 J ..... "L ‘ information I \ /
1
Channel encoding (Tree partitioning, CRC, RCPC)
t
‘ compressed ‘me 1 t
Fig. 9A "
Streaming data
US. Patent Apr. 21, 2009 Sheet 11 0f 18 US 7,522,774 B2
Raw YUV color frame data
1 RN ME/MC
, ltmlmwar Io. , .05 $35
I frames
t
information
Source Encoder (ARIES VII) l-ARIES l/ll
Channel encoding (Tree partitioning, CRC, RCPC)
Compressed Fite
Optxonat
Streaming data
Fig. 9B
US. Patent Apr. 21, 2009 Sheet 12 0f 18 US 7,522,774 B2
Streaming data
Optional
Compressed Video (.qsx ?le format)
CABAC coded motion
information
i Channel decoding (Tree merging, CRC, RCPC)
Source Decoder (l-ARIES l/ll)
Bypass Mi? for I ] _fra mes
2 1/
MC" Frame Buffer
41 ~\ \
inverse Wavelet Transform
Fig. 10A Raw YUV data
US. Patent Apr. 21, 2009 Sheet 13 0f 18 US 7,522,774 B2
Streaming data
Optional
Compressed Video (.qsx ?le format)
CABAC coded motion
information
Channel decoding (Tree merging, CRC, RCPC)
Source Decoder (l-ARIES I/ll)
QL Bypass M/O y W" "Ya; I ‘
if frames WW Frame Buffer
Inverse Wavelet Transform
Raw YUV data
Fig. 108
US. Patent Apr. 21, 2009 Sheet 14 0f 18 US 7,522,774 B2
frame)
l Perform a ME/MC on the coarsest subbands as parent subbands of a current frame other than the l-frame with
respect to the identi?ed reference frame to generate one or more motion vectors for the coarsest subbands.
l Identify a reference frame (129., the ?rst frame or an I- ' l l
Estimate the spatiat shifting of pixels of child subbands using the motion vectors of the parent subbands to determine a search area of the child saubbandsv
l Perform a ME/MC for the child subbands to determine the
motion vectors of the child subbands.
More child subbands?\)
‘l 7 Perform compression on the predicted/compensated data L/xl { 05 into compressed data (6.9., see, Figs. 5 and 8)
Fig. 11
US. Patent Apr. 21, 2009 Sheet 15 0f 18 US 7,522,774 B2
Fig. 12
l
i > \
o
k C m “D e C n C r C f P. R
,IIH MH a ,
L ._DL blah, Unll soH mwH, amL VE er 2% k0 O k
Boundary of the — - - — Search Area for
refinement MVs
Refinement Vector A 0 fox level k, k orientation 0 31:12:; Block Neighborhood
US. Patent Apr. 21, 2009 Sheet 16 0f 18 US 7,522,774 B2
integer Motion Prediction
Half-Pei Motion Prediction
U S. Patent Apr. 21, 2009 Sheet 17 0f 18 US 7,522,774 B2
iction ion Pred Integer Mot
Half-Pei Motion Prediction
Fig. 14
US. Patent Apr. 21, 2009 Sheet 18 0f 18 US 7,522,774 B2
Block currently being tested
Matching block
OBMC when current block
being tested is in iMV mode
> Motion Vector (identical colors __ __> denote MVs of the same block)
Displaced MV to transiate
8W t Vn ?e EU mow 9“ mo OM t km CC b8 .Dm Cll moo ‘In wm a...“ 8 mo P --> being tested
Fig. 15
OBMC when current block
being tested is in 4MV mode
US 7,522,774 B2 1
METHODS AND APPARATUSES FOR COMPRESSING DIGITAL IMAGE DATA
This application claims the bene?t of US. Provisional Application No. 60/552,l53, ?led Mar. 10, 2004, US. Pro visional Application No. 60/ 552,356, ?led Mar. 10, 2004, and US. Provisional Application No. 60/552,270, ?led Mar. 10, 2004. The above-identi?ed applications are hereby incorpo rated by reference in their entirety.
FIELD OF THE INVENTION
The present invention relates generally to multimedia applications. More particularly, this invention relates to com pressing digital image data.
BACKGROUND OF THE INVENTION
A variety of systems have been developed for the encoding and decoding of audio/video data for transmission over Wire line and/or Wireless communication systems over the past decade. Most systems in this category employ standard com pression/transmission techniques, such as, for example, the ITU-T Rec. H.264 (also referred to as H.264) and ISO/IEC Rec. l4496-l0AVC (also referred to as MPEG-4) standards. HoWever, due to their inherent generality, they lack the spe ci?c qualities needed for seamless implementation on loW poWer, loW complexity systems (such as hand held devices including, but not restricted to, personal digital assistants and smart phones) over noisy, loW bit rate Wireless channels. Due to the likely business models rapidly emerging in the
Wireless market, in Which cost incurred by the consumer is directly proportional to the actual volume of transmitted data, and also due to the limited bandWidth, processing capability, storage capacity and battery poWer, ef?ciency and speed in compression of audio/video data to be transmitted is a major factor in the eventual success of any such multimedia content delivery system. Most systems in use today are retro?tted versions of identical systems used on higher end desktop Workstations. Unlike desktop systems, Where error control is not a critical issue due to the inherent reliability of cable LAN/WAN data transmission, and bandWidth may be assumed to be almost unlimited, transmission over limited capacity Wireless netWorks require integration of such sys tems that may leverage suitable processing and error-control technologies to achieve the level of ?delity expected of a commercially viable multimedia compression and transmis sion system.
Conventional video compression engines, or codecs, can be broadly classi?ed into tWo broad categories. One class of coding strategies, knoWn as a doWnload-and-play (D&P) pro ?le, not only requires the entire ?le to be doWnloaded onto the local memory before playback, leading to a large latency time (depending on the available bandWidth and the actual ?le siZe), but also makes stringent demands on the amount of buffer memory to be made available for the doWnloaded payload. Even With the more sophisticated streaming pro?le, the current physical limitations on current generation trans mission equipment at the physical layer force service provid ers to incorporate a pseudo-streaming capability, Which requires an initial period of latency (at the beginning of trans mission), and continuous buffering henceforth, Which imposes a strain on the limited processing capabilities of the hand-held processor. Most commercial compression solu tions in the market today do not possess a progressive trans mission capability, Which means that transmission is possible only until the last integral frame, packet or bit before band
20
25
30
35
40
45
50
55
60
65
2 Width drops beloW the minimum threshold. In case of video codecs, if the connection breaks before the transmission of the current frame, this frame is lost forever.
Another draWback in conventional video compression codes is the introduction of blocking artifacts due to the block-based coding schemes used in most codecs. Apart from the degradation in subjective visual quality, such systems suffer from poor performance due to bottlenecks introduced by the additional de-blocking ?lters. Yet another draWback is that, due to the limitations in the Word siZe of the computing platform, the coded coef?cients are truncated to an approxi mate value. This is especially prominent along object bound aries, Where Gibbs’ phenomenon leads to the generation of a visual phenomenon knoWn as mosquito noise. Due to this, the blurring along the object boundaries becomes more promi nent, leading to degradation in overall frame quality.
Additionally, the local nature of motion prediction in some codes introduces motion-induced artifacts, Which cannot be easily smoothened by a simple ?ltering operation. Such prob lems arise especially in cases of fast motion clips and systems Where the frame rate is beloW that of natural video (e. g., 25 or 30 fps non-interlaced video). In either case, the temporal redundancy betWeen tWo consecutive frames is extremely loW (since much of the motion is lost in betWeen the frames itself), leading to poorer tracking of the motion across frames. This effect is cumulative in nature, especially for a longer group of frames (GoF).
Furthermore, mobile end-user devices are constrained by loW processing poWer and storage capacity. Due to the limi tations on the silicon footprint, most mobile and hand-held systems in the market have to time- share the resources of the central processing unit (microcontroller or RISC/CISC pro cessor) to perform all its DSP, control and communication tasks, With little or no provisions for a dedicated processor to take the video/audio processing load off the central processor. Moreover, most general-purpose central processors lack the unique architecture needed for optimal DSP performance. Therefore, a mobile video-codec design must have minimal client-end complexity While maintaining consistency on the ef?ciency and robustness front.
SUMMARY OF THE INVENTION
Methods and apparatuses for compressing digital image data are described herein. In one embodiment, a Wavelet transform is performed on each pixel of a frame to generate multiple Wavelet coef?cients representing each pixel in a frequency domain. The Wavelet coef?cients of a sub-band of the frame are iteratively encoded into a bit stream based on a target transmission rate, Where the sub-band of the frame is obtained from a parent sub-band of a previous iteration. The encoded Wavelet coef?cients satisfy a predetermined thresh old based on a predetermined algorithm While the Wavelet coef?cients that do not satisfy the predetermined threshold are ignored in the respective iteration.
Other features of the present invention Will be apparent from the accompanying draWings and from the detailed description Which folloWs.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by Way of example and not limitation in the ?gures of the accompanying draWings in Which like references indicate similar elements.
FIG. 1 is a block diagram illustrating an exemplary multi media streaming system according to one embodiment.