Upload
z
View
217
Download
5
Embed Size (px)
Citation preview
Journal of Systems Engineering and Electronics
Vol. 18, No. 4, 2007, pp.795–800
High performance scalable image coding
Gan Tao, He Yanmin & Zhu Weile
School of Electronic Engineering, Univ. of Electronic Science and Technology of China, Chengdu 610054, P. R. China
(Received October 19, 2006)
Abstract: A high performance scalable image coding algorithm is proposed. The salient features of this algorithm
are the ways to form and locate the significant clusters. Thanks to the list structure, the new coding algorithm
achieves fine fractional bit-plane coding with negligible additional complexity. Experiments show that it performs
comparably or better than the state-of-the-art coders. Furthermore, the flexible codec supports both quality and
resolution scalability, which is very attractive in many network applications.
Keywords: scalable image coding, significant clusters, list structure.
1. Introduction
With the development of the Internet and networking
technology, there is a trend of growing heterogeneity
in digital image applications. Interesting examples in-
clude image database previewing and progressive im-
age transmission where the constraints on bit rate or
display resolution cannot be anticipated at the time
of compression. The challenge is how a single flex-
ible bitstream can be provided to accommodate the
users’ different needs, according to their bandwidth
and computing capabilities. Scalable image coding
thus emerges as a promising technology to answer this
challenge. There are generally two types of scalabil-
ity in image coding: quality scalability and resolution
scalability. Quality scalability is a feature of the en-
coded bitstream that allows decoders to decode the
image in the same spatial resolution, but with dif-
ferent fidelity. A fully quality scalable bitstream, also
known as an embedded bitstream, can be truncated at
any point to achieve the best possible reconstruction
for the number of bits received. Resolution scalabil-
ity, on the other hand, is a useful functionality that
allows decoders to decode the image with different res-
olutions.
Over the past decade, great efforts have been made
to develop image compression algorithms, which have
features of good compression performance, modest
complexity, as well as high scalability. Wavelet-
based compression schemes have demonstrated the
promise to achieve the goal. A good example is the
JPEG2000 image compression standard. In its core
algorithm-Taubman’s embedded block coding with
optimized truncation (EBCOT)[1] and the fractional
bit-plane coding combined with post-compression
rate-distortion (PCRD) optimization achieves high
performance. As in each bit-plane, all coefficients
need to be coded through several passes. The
computational complexity is rather high. On the
other hand, to avoid pixel-by-pixel coding, codecs
based on morphological representation have been
proposed, such as, Servetto et al’s morphological
representation of wavelet data (MRWD) algorithm[2],
Chai et al’s significant-linked connected component
analysis (SLCCA) algorithm[3], and Lazzaroni et al.’s
embedded morphological dilation coding (EMDC)
algorithm[4].However, to one’s knowledge, none of the
morphocodecs so far in the literature have solved the
problem of high complexity caused by data ordering
and redundant dilation operations. None of them
have achieved two types of scalability in a single
framework.
2. Data modelling and classification
2.1 Representation of wavelet data
There exist two distinct approaches to get an efficient
796 Gan Tao, He Yanmin & Zhu Weile
classification and representation of wavelet coefficients
in the literature. Although Shapiro’s embedded ze-
rotree wavelet compression algorithm (EZW)[5] and
Said and Pearlman’s set partitioning in hierarchical
tree (SPIHT) algorithm[6] employ a regular zerotree
structure to approximate insignificant groups of coeffi-
cients across subbands, MRWD, SLCCA, and EMDC
represent irregular clusters of significant coefficients
within subbands. The well-known SPIHT algorithm
enjoys good performance with low complexity. Yet, it
has its own drawbacks. First, it fails to efficiently rep-
resent the regions rich in texture. As many nonzero
coefficients exist in high frequency subbands, which
are not well aligned with the tree-structure grid, ze-
rotree structure becomes quite inefficient. Second, as
the bitstreams from different scales are interwoven,
SPIHT does not support resolution scalability. Efforts
have been made by Danyali and Mertins to add the
resolution scalability feature to SPIHT[7], but their
algorithm suffers from a loss of performance. Based
on these considerations, the authors have adopted the
clustering representation instead of zerotree in their
scheme. Being free of the constraint of tree struc-
ture, the subbands can be coded independently, which
makes the task of scalability supporting much easier.
2.2 Classification and ordering
The embedded coding raises the problem of ordering
information according to its importance. It is prefer-
able to code and transmit the most valuable infor-
mation as early as possible. This is the problem of
pixel sorting, originally considered by Li and Lei[8].
For each pixel to be coded, the rate-distortion (R-D)
slope is estimated. The optimal rate-distortion perfor-
mance can be achieved by selecting the pixel with the
maximum R-D slope at each coding stage. The high
complexity involved in the optimal solution is gener-
ally unacceptable in practice, thus fractional bit-plane
coding becomes a good candidate.
Bit-plane coding is the natural choice for embed-
ded image compression. For EBCOT in JPEG2000,
one bit-plane coding is further decomposed into three
passes according to the pixels’ significant situation.
The unknown pixels are classified into two categories.
The pixels that have at least one immediate signifi-
cant neighbor are coded in the first significant propa-
gation pass, whereas, the others are coded in the final
clean-up pass. Obviously, this classification is rather
coarse. The pixels with different significant neigh-
bors in value and number are undistinguished. To
get a better data classification, Peng et al., proposed
a pixel classification and sorting method[9]. In their
algorithm, up to eight significant coding passes and
one magnitude refinement pass are employed in each
bit-plane coding. Consequently, the performance is
improved at the price of higher complexity. In the
following, the authors present their classification and
ordering scheme, as a better trade-off between the per-
formance and complexity.
The unknown pixels to be coded in each bit-plane
are categorized into the following types: (1) the ones
that have a significant neighbor with a big value; (2)
The ones that have a significant neighbor with a small
value. For the pixels with no significant neighbor, the
authors have a further classification according to the
significance state of their parents; (3) the ones that
have a significant parent with a big value; (4) the ones
that have a significant parent with a small value.
Note that the classification, which is based on clus-
tering nature of wavelet coefficients, is finer than that
of EBCOT. It has been shown that in significant cod-
ing, the symbol with higher probability of significance
has a larger R-D slope[8]. The authors claim that
the unknown pixels sorted by their type (A-D), are
ordered in decreasing probability of significance and
thus a better performance is expected to be achieved.
This claim is substantiated by the experiments.
Furthermore, as seen later on, the implementation
of this classification can be done efficiently. The addi-
tional complexity is negligible.
3. Proposed morphological coding
algorithm
3.1 Fractional bit-plane coding
In this algorithm, wavelet data of subbands are repre-
sented and coded as clusters. The cluster is depicted
as a mass of significant pixels surrounded by insignif-
icant ones. They are organized in the following lists.
LSPk the list of all significant pixels, which form
High performance scalable image coding 797
the core of the clusters in the k-th spatial resolution
level.
LIPk the list of all insignificant pixels, which form
the boundary of the clusters in the k-th spatial reso-
lution level.
LISk the list of insignificant sets in the k-th spatial
resolution level with varying sizes, which represent the
remaining insignificant parts.
The elements in both LSPk and LIPk are further
classified into the new and the old ones, depending
on whether they are added to the list in the current
bit-plane pass or not.
Let P n denote the bit-plane pass n, Pn,sk the s-th
sub-bitplane-pass for the process of resolution level k,
N the maximum number of spatial resolution level.
On the basis of the list structures defined earlier,
the authors have developed a fractional bit-plane cod-
ing with the following six sub-passes {P n,sk }n, s, k, to
achieve excellent R-D performance.
Sub-pass 1 (P n,1k , k ∈ [1, N ]): intraband cluster
growing for each entry of LIPk, that is, morphological
dilation on the boundary of the old cluster.
Sub-pass 2 (P n,2k , k ∈ [1, N ]): intraband cluster
growing for the new entry of LSPk, that is, morpho-
logical dilation on the new cluster.
Sub-pass 3 (P n,3k , k ∈ [1, N ]): interband cluster
expansion for old entry of LSPk−1, that is, morpho-
logical dilation on the pixels whose parents belong to
the old clusters.
Sub-pass 4 (P n,4k , k ∈ [1, N ]): interband cluster
expansion for new entry of LSPk−1, that is, morpho-
logical dilation on the pixels whose parents belong to
the new clusters.
Sub-pass 5 (P n,5k , k ∈ [1, N ]): refinement pass for
old entry of LSPk.
Sub-pass 6 (P n,6k , k ∈ [1, N ]): isolate cluster lo-
cation by using set partitioning method on each entry
of LISk.
It is easy to see that proposed fractional bit-plane
coding is just the simple implementation of the afore-
mentioned data classification scheme. Following the
fact that the significance probability is primarily de-
termined by only one significant neighbor[9], the au-
thors’ do not distinguish the cases where the pixel has
one significant neighbor or more. Additionally, in-
stead of introducing a special comparison procedure
to identify the significant neighbors with different val-
ues, the neighbors that have been detected as signif-
icant in the previous bit-plane are taken as the ones
with big values, and the neighbors that have newly
become significant in the current bit-plane are taken
as the ones with small values. Thus, the task is simpli-
fied by keeping two lists, which contain the old and the
new entries during each bit-plane coding and merging
them at the end of the pass. It can be seen that with
the help of the list structure, this classification scheme
is implemented very efficiently.
It is worth noting that the new approach in cluster
formation is the distinguished feature of the proposed
algorithms. As far as is known, all the reported mor-
phocodecs create new clusters at the beginning of each
bit-plane coding, by taking the significant pixels as the
seeds for new morphological dilation. Because of the
lack of a proper structure, they fail to take advantage
of the fact that the clusters tend to grow both in the
spatial and frequency domains when crossing succes-
sive bit-planes. Here, the problem is easily overcome
by starting the growing region from the boundary of
clusters that already exist, that is, from the pixels in
LIPk, similar towhat is done in sub-pass 1. As all
the inner significant pixels of the clusters have been
coded in the previous bit-plane, the reconsideration is
unnecessary and wasteful.
3.2 Patterned morphological dilation
The idea of morphological representation was first in-
troduced by Servetto in his MRWD algorithm. The
clusters of significant coefficients are formed and coded
through iterative operations of morphological dilation.
More details can be found in Ref. [2-3]. In this algo-
rithm, 3 × 3 structure element is used. As the cluster-
ing nature of the wavelet data, the recursive dilations
working on the adjacent pixels may cause highly re-
dundant operations. For example, in an extreme case,
the seed pixel of the previous dilation has eight unnec-
essary tests in the later process. To overcome this, a
new dilation operator, called patterned morphological
dilation, is introduced.
Considering one general case depicted in Fig. 1(a),
in which the eight neighbors of pixel 0 have been tested
798 Gan Tao, He Yanmin & Zhu Weile
(a)one general case of morphological dilation; (b)–(i) eight morphological dilation patterns dedicated to dilations on
eight different positions. Gray and white circles denote seeds and corresponding dilations respectively
Fig. 1 Patterned morphological dilations
during the process of morphological dilation, suppose
pixel 1 becomes significant, the dilation will be ap-
plied on it recursively. It is obvious that only the five
new pixels (pixel 9–pixel 13) instead of all the eight
neighbors of pixel 1 need to be considered. Similarly,
the morphological dilation on other pixels can also be
simplified. Based on this consideration, the authors
define eight morphological dilation patterns dedicated
to dilations on eight different positions as depicted in
Fig. 1(b-i). It can be seen that with these patterned
dilations, on an average, half the visiting time will be
saved.
3.3 Set partitioning method
The set partitioning method to locate the remaining
clusters is another feature of the proposed algorithm.
After the most significant clusters have been extracted
in former sub-passes, there are a few scattered in
the remaining space on the subbands. They can
be effectively coded through the set-partitioning
approach. In traditional partitioning methods, such
as, quadtree partitioning[10], the insignificant coeffi-
cients denoted by a zero block must be in a square
block. However, in this situation, the insignificant
space is dominating and its shape is unpredictable.
It seems to be inefficient to represent it in a fixed
shape. Here a more flexible, set partitioning method
is proposed, in which the zero-block can be in
a square or rectangular shape. Unlike quadtree
partitioning, which subdivides a block into four
equally sized subblocks, binary partitioning is used
here. That is, blocks are divided in two, rather than
quarters. During partitioning, each set in LISk is
divided into approximately equally sized halves and
the authors alternate between splitting horizontally
and vertically. Once the significant pixel is located,
morphological dilation is applied to it immediately.
Additionally, to further improve the efficiency, we
aggressively discard the regions of known clusters are
aggressively discarded from the insignificant set. This
can be accomplished by shrinking the set boundary
to a smaller box before further partitioning.
4. Codestream structure and parsing
In the multimedia server-client environments, the re-
quested image quality and resolution level may vary
significantly from one application to another. To cater
to the needs, the bitstream can be generated by the
encoder in one particular format. The scalability fea-
ture enables the stream parser to extract the data of
interest and assemble the final codestream in the re-
quested format to the decoder.
Figure 2 shows an example of a codestream estab-
lished in the proposed algorithm for progression by
precision, with the finest granularity. In this case, the
bitstreams from different scales are interleaved in each
bit-plane pass P n. The tags are inserted to provide
the information required for identifying each coding
unit {P n,sk }n, s, k when parsing.
Fig. 2 Example of the hierarchical layout of the codestream
High performance scalable image coding 799
5. Complexity analysis
In addition to an impressive performance, the pro-
posed algorithm enjoys low complexity, which is com-
parable to that of SPHIT and less than that of
EBCOT and other morphocodecs.
To get a high coding efficiency, this scheme of data
organization and operation is very similar to that of
SPHIT. Three lists (LSP, LIP, LIS) are kept to group
the data of significant pixels, insignificant pixels, and
insignificant sets. The difference lies in the way of lo-
cating significant pixels (or significant clusters in the
authors’ notation). It is accomplished by recursive
checking and partitioning the spatial orientation trees
in SPHIT, although, morphological dilation and bi-
nary partitioning in the algorithm is proposed. As
simple basic operations are involved, the computa-
tional complexity of the two algorithms are compa-
rable. Besides, the authors’ management of the lists
is more efficient. Although, in SPHIT the significant
pixels are retained in the temporary lists until the hi-
erarchical trees are decomposed to certain level, here
these pixels are detected directly through cluster dila-
tion and prediction. Thus the operations to allocate
and free the temporary lists are saved.
As mentioned earlier, both EBCOT and the pro-
posed algorithm employ fractional bit-plane coding
to achieve good R-D performance. In EBCOT, each
unknown pixel is arithmetically coded in multiple
passes, based on the context. Therefore, its complex-
ity is proportional to the number of unknown pixels
Nunknown. In the proposed algorithm, multiple scan-
ning is avoided, as the data has already been sorted
in the lists. Finer fractional pass coding is achieved
through efficient list operations. More importantly,
as only the newly significant pixels and their neigh-
bors need to be coded, the complexity is on the or-
der of the number of significant pixels Nsig, which is
usually much smaller than Nunknown. Furthermore,
unlike EBCOT, there is no post optimization process-
ing involved. Therefore, it can be concluded that the
complexity of the proposed algorithm is less than that
of EBCOT.
Compared with other morphocodecs, this algorithm
has the striking advantage of lower computational
complexity. As most morphocodecs, so far in the liter-
ature, start morphological dilation on significant pix-
els, here only the ones on cluster boundaries are con-
sidered and the significant redundant operations are
spared. Moreover, patterned morphological dilation
is introduced, which further cuts nearly half the num-
ber of visiting times during the process of recursive
dilation.
6. Experimental results
The performance of the algorithm is evaluated on
five natural 512 × 512 grayscale images: Lena, Bar-
bara, Goldhill, Crowd, and Couple. The five-scale dis-
crete wavelet transform is used with the Daubechies
9/7 biorthogonal wavelet filter and symmetric exten-
sion mode. The bitstream is generated in a format
fully embedded by precision as illustrated in Fig. 2.
The performance is compared with two state-of-the-
art image coding algorithms, SPIHT and EBCOT.
The PSNR results are listed in Table 1.
From the table, it can be seen that the performance
of the proposed algorithm is better than SPIHT and
EBCOT, on an average. Compared with SPIHT, the
improvement is evident, especially for images that are
rich in texture and detail. For instance, the proposed
algorithm averagely outperforms SPIHT by 0.81 dB
for image Barbara and 0.55 dB for image Couple.
Compared with EBCOT, better performance is also
achieved, which fits the expectation about this finer
classification scheme.
It should be noted that as the testing codestream is
organized in a fully embedded fashion, the overhead of
tags inserted in the stream cannot be ignored. The ex-
periment shows quality loss ranges 0.04–0.16 dB over
various bitrates. The lower the bitrate becomes, the
more the loss it will suffer. This can be verified by
the result, which shows that the proposed algorithm
has growing superiority with the increase of bitrate.
For example, at bit rate of 1.0 bpp, it outperforms the
others with 0.52 dB PSNR gain over SPIHT and 0.39
dB The authors have presented a scalable gain over
EBCOT, on an average. Therefore, when a relatively
coarse quality partitioning is requested, better perfor-
mance is expected as the amount of overhead data is
substantially reduced.
800 Gan Tao, He Yanmin & Zhu Weile
Table 1 Performance comparison (PSRN [dB]) of
SPIHT, EBCOT, and the proposed algorithm
Image bpp SPIHT EBOT proposed
Lena 0.125 31.09 31.05 31.16
0.25 34.11 34.16 34.30
0.50 37.21 37.29 37.46
1.00 40.41 40.48 40.59
Barbara 0.125 24.86 25.37 25.37
0.25 27.58 28.40 28.43
0.50 31.39 32.29 32.31
1.00 36.41 37.11 37.38
Goldhill 0.125 28.48 - 28.50
0.25 30.56 30.59 30.65
0.50 33.13 33.25 33.36
1.00 36.55 36.59 36.88
Crowd 0.125 27.07 26.85 27.07
0.25 30.15 29.97 30.16
0.50 33.89 33.68 34.17
1.00 38.86 38.63 39.34
Couple 0.125 26.76 26.92 26.94
0.25 29.05 29.35 29.65
0.50 32.45 32.67 33.09
1.00 36.58 36.65 37.20
7. Conclusions
image coding algorithm, based on efficient methods
of cluster extraction. It employs new efficient frac-
tional bit-plane coding, based on a list structure and
introduces patterned morphological dilation to signifi-
cantly reduce the redundant dilation operations. Con-
sequently, as a counterpart of SPIHT, it achieves bet-
ter performance with comparatively low complexity.
Moreover, the flexible algorithm supports both qual-
ity and resolution scalability, which is very attractive
in many multimedia applications, especially in image
transmission over heterogeneous networks.
References
[1] Taubman D. High performance scalable image compression
with EBCOT. IEEE Trans. on Image Processing, 2000,
9(7): 1158–1170.
[2] Servetto S D, Ramchandran K, Orchard M T. Image coding
based on a morphological representation of wavelet data.
IEEE Trans. on Image Processing, 1999, 8(9): 1161–1174.
[3] Chai B, Vass J, Zhuang X. Significance-linked connected
component analysis for wavelet image coding. IEEE Trans.
on Image Processing, 1999, 8(6): 774–784.
[4] Lazzaroni F, Leonardi R, Signoroni A. High-performance
embedded morphological wavelet coding. IEEE Signal
Processing Letters, 2003, 10(10): 293–295.
[5] Shapiro J M. Embedded image coding using zerotrees of
wavelet coefficients, IEEE Trans. Signal Process, 1993, 41
(12): 3445–3462.
[6] Said A, Pearlman W A. A new fast and efficient image
codec based on set partitioning in hierarchical trees. IEEE
Trans. on Circuits and Systems for Video Technology,
1996, 6 (3): 243–250.
[7] Danyali H, Mertins A. Flexible, highly scalable, object-
based wavelet image compression algorithm for network
applications. IEE Proceedings on vision and image signal
process, 2004, 151(6): 498–509.
[8] Li J, Lei S. Rate-distortion optimized embedding. Proc.
Picture Coding Symp., Berlin, Germany, 1997: 201–206.
[9] Peng K, Kieffer J C. Embedded image compression based
on wavelet pixel classification and sorting. IEEE Trans.
on Image Processing, 2004, 13(8): 1011–1017.
[10] Pearlman W, Islam A, Nagaraj N, Said A. Efficient, low-
complexity image coding with a set-partitioning embedded
block coder. IEEE Trans. on Circuits and Systems for
Video Technology, 2004, 14 (11): 1219–1235.
Gan Tao was born in 1977. He is a ph. D. can-
didate at Shanghai University. His research interests
include communication signal processing and multiple
antenna systems. E-mail: [email protected]
He Yanmin was born in 1977. Now she is a Ph. D.
candidate in Automation Engineering in the Univer-
sity of Electronic Science and Technology of China.
Zhu Weile was born in 1940. Now he is a professor
and doctor supervisor in Electronic Engineering, in
the University of Electronic Science and Technology
of China. His research interests include networking,
multimedia signal processing, digital video compres-
sion, and communication.