2D-pattern matching image and video compression: theory, algorithms, and experiments

318 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 11, NO. 3, MARCH 2002

2D-Pattern Matching Image and Video Compression:Theory, Algorithms, and ExperimentsMarc Alzina, Wojciech Szpankowski, Senior Member, IEEE, and Ananth Grama

Abstract—In this paper, we propose a lossy data compressionframework based on an approximate two-dimensional (2D)pattern matching (2D-PMC) extension of the Lempel–Ziv losslessscheme. This framework forms the basis upon which higherlevel schemes relying on differential coding, frequency domaintechniques, prediction, and other methods can be built. We applyour pattern matching framework to image and video compressionand report on theoretical and experimental results. Theoretically,we show that thefixed database modelused for video compressionleads to suboptimal but computationally efficient performance.The compression ratio of this model is shown to tend to thegeneralized entropy. For image compression, we use agrowingdatabasemodel for which we provide an approximate analysis.The implementation of 2D-PMC is a challenging problem fromthe algorithmic point of view. We use a range of techniquesand data structures such as -d trees, generalized run lengthcoding, adaptive arithmetic coding, and variable and adaptivemaximum distortion level to achieve good compression ratios athigh compression speeds. We demonstrate bit rates in the range of0.25–0.5 bpp for high-quality images and data rates in the rangeof 0.15–0.5 Mbps for a baseline video compression scheme thatdoes not use any prediction or interpolation. We also demonstratethat this asymmetric compression scheme is capable of extremelyfast decompression making it particularly suitable for networkedmultimedia applications.

Index Terms—Approximate pattern matching, arithmeticcoding, generalized run-length coding, generalized Shannonentropy, -d trees, Lempel–Ziv schemes, multimedia compression,rate distortion.

I. INTRODUCTION

L OSSLESS data compression based onexact patternmatching can be traced back to seminal papers of Lempel

and Ziv [26], [27]. More recently, there has been a resurgenceof interest in extending these schemes to lossy data compres-sion (cf. [2]–[5], [11], [13], [15], [17], [20], [21], [23], [24]).This can be largely attributed to the rapid growth indigitalrepresentation of multimedia (e.g., text, audio, image, video,etc.) which is particularly amenable to exact andapproximatepattern matching manipulations.

We believe that the first implementation of lossy image com-pression based on approximate pattern matching (i.e., a combi-

Manuscript received April 12, 1999; revised August 18, 2000. This workwas supported by the NSF under Grants NCR-9415491, C-CR-9804760, ACI-9872101, and EIA-9806741, and a Purdue University GIFC Grant. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Lina J. Karam.

M. Alzina is with ENST, 75013 Paris, France (e-mail: [email protected]).W. Szpankowski and A. Grama are with the Department of Computer

Sciences, Purdue University, West Lafayette, IN 47907 USA (e-mail: [email protected]; [email protected]).

Publisher Item Identifier S 1057-7149(02)03009-9.

nation of vector quantization and lossy LZ78) was reported byConstantinescu and Storer [4]. However, no theoretical justifica-tion was provided for this scheme and experimental results werepreliminary. The key issue of whether such a scheme is asymp-totically optimal (like its lossless Lempel–Ziv counterpart) wasfirst addressed by Steinberg and Gutman [17], and eventuallyresolved independently by Łuczak and Szpankowski [15] andYang and Kieffer [23]. They proved that a straightforward lossyextension of Lempel–Ziv scheme is not asymptotically optimal.It was demonstrated that the compression bit rate asymptoti-cally achieves thegeneralized Shannon entropy; which is higherthan the optimal rate distortion function. More recently, Kon-toyiannis [13] proposed a scheme based on approximate pat-tern matching that is asymptotically optimal for memorylesssources, but its practical applicability is as yet unclear. Otheroptimal schemes have been proposed by Zamir and Rose [24]and Zhang and Wei [25], however, thegolden-washing schemeof [25] does not seem to work very well in practice.

The central theme of a lossy extension of Lempel–Ziv algo-rithm is the notion of approximate repetitiveness. If a portionof data recurs in an approximate sense, then subsequent occur-rences can be stored as direct or indirect references to the firstoccurrence. This approximate recurrence of data may not becontiguous. Somewhat surprisingly, this theme of exploiting ap-proximate repetitiveness is uniformly applicable to multimediadata from various sources such as text, images, video, and evenaudio. However, approximate repetitiveness can be hidden invarious forms for different types of media. Therefore, one mustconsider different distortion measures. This is in contrast withcurrent state of the art, where a different approach is used forcompressing each of these types of data.

Using the theoretical underpinning of [15], [23], Atallahetal. [3] implemented a lossy scheme for image compression.This scheme, calledpattern matching image compression(PMIC), was based onone-dimensional pattern matching,enhanced with some salient features. It was shown that forhigh-quality images, the PMIC scheme is competitive withJPEG and wavelet image compression up to 1 bpp. However,the superiority of PMIC at decompression (PMIC does notrequire any computation since it basically only reads data)may turn out to be a crucial advantage in practice whereasymmetric compression/decompression is a desirable feature(e.g., video). However, a bit rate of 1 bpp is not that interestingfrom a practical point of view since most image compressionapplications require bit rates of 0.5 bpp to 0.25 bpp. In thispaper, we describe a new implementation based ontwo-di-mensional (2D) approximate pattern matching that achievescompression of up to 0.25 bpp for images while preserving

1057–7149/02$17.00 © 2002 IEEE

ALZINA et al.: 2D-PATTERN MATCHING IMAGE AND VIDEO COMPRESSION 319

other desirable properties of PMIC. We call this scheme 2Dpattern matching for image compression (2D-PMIC) , while forgeneral multimedia compression we coin the term 2D-patternmatching compression (2D-PMC). We also use this idea todevelop a new compression scheme for video data that uses theprevious frame as the database. Our preliminary experimentalresults exceed even our expectations: we obtain high-qualityvideo with bit rates of 0.15–0.5 Mbps. We should underlinethat this scheme does not use any prediction, interpolation,differential coding, frequency domain transforms, or quanti-zations. When combined with all these other optimizations,the pattern matching framework is expected to yield evenhigher compression rates. We report on our theoretical andexperimental findings in this paper. The reader is referredto http://www.cs.purdue.edu/homes/spa/Compression/2D-PMC.html or http://www.cs.purdue.edu/homes/ayg/Video/ formore image and video compression samples and results.

In this paper, we first discuss the theoretical underpinningsand subsequently present algorithmic and experimental results.Our video compression scheme is modeled well by thefixeddatabase model(cf. [13], [20], and [23]). In this model, the firstframe serves as a fixed database (reference) for compressingsubsequent frames using approximate pattern matching. For thismodel, we report a theoretical asymptotic bit rate which turnsout to be the generalized Shannon entropy. While the general-ized Shannon entropy is higher than the optimal rate distortionfunction, we observe that, at least for memoryless sources, thesetwo rates do not differ by much (cf. [15]). Our scheme for com-pressing image data can be modeled using agrowing databaseformed by the continually changing reference correspondingto segments of the image that have been compressed. Preciseasymptotic analysis of this scheme is beyond the current stateof the art. However, we formulate rigorously the problem, pro-pose an approximate analysis, and present some experimentalresults.

The implementation of 2D-PMC poses a challenging problemfrom the algorithmic standpoint. Finding an efficient data struc-ture for approximatesearch of multidimensional sets in largedatabases is an interesting problem in itself. We use a set ofmodified -d trees enhanced by generalized run length codingfor approximate search. A key issue for high-quality image andvideo compression is the design of an adaptive distortion mea-sure that automatically adjusts its maximum distortion to pro-duce perceptually high-quality results. We propose a few solu-tions and examine their performance in this paper.

The rest of this paper is organized as follows. Section IIpresents theoretical results for the fixed database model anda heuristic analysis for the growing database model. In Sec-tion III, we turn our attention to algorithmic issues and discusssalient features of our implementation. Finally, in Section IV,we present some experimental results for image and videocompression and conclude with a brief list of advantages anddrawbacks of the proposed compression scheme.

II. THEORETICAL UNDERPINNINGS

Consider a source sequence taking values froma finite alphabet (e.g., ). We write

, and for the probability of .We encode into a compression code and the decoderproduces an estimate of . More precisely, a code isa mapping : , and we write for thecompression code of , where lower-case letters representrealizations of a stochastic process. On the decoding side, thedecoder function : is applied to find

. Let be the length of a code (in bits) repre-senting . Then, thebit rate is defined as

, theaveragebit rate as , andthecompression ratio as .

We considersingle-letter fidelitydistortion measures:such that

or, more generally, a subadditive distortion measure satisfying,for any two integers , , and given vectors ,

Examples of fidelity measures satisfying the above criteria in-clude theHamming distanceand thesquared error distortion.The Hamming distance equals zero if andone otherwise; and, in the squared error distortion,

. Our original PMIC was based on the squared errordistortion, which is natural for image compression. In the cur-rent implementation, we develop several novel distortion mea-sures that adjust their maximum value to produce the best pos-sible perceptual results (cf. Section III-B).

A. Fixed Database Model

Our implementation of pattern matching video compressionis modeled well by the fixed database scheme. In this scheme,the decoder and the encoder both have access to the commondatabase sequence (e.g., the first image in the video stream)generated according to the distribution. The source sequence

(e.g., for images) is partitioned accordingto into variable length phrases (e.g., rectangles in 2D-PMC)

of lengths , respectively. Tosimplify presentation, we start with the one-dimensional case,and subsequently provide bounds for the multidimensional case.

For given fixed distortion bound (to simplify the pre-sentation, all our definitions refer to the one-dimensional case)

This implies that the first partition comprises of the longestprefix of the input that matches a substring in the database

to the specified tolerance. The string recovered by thedecoder is therefore given by


Subsequent partitions are computed in a similar manner. That is,if one sets , the st phrase of partition

is defined as

Thus, the source sequence is partitioned according toaswhile the decoder recovers the string

which is within distortion from . Thisfollows from the subadditive property of the distortion measure.We represent each by a pointer to the database and itslength(s) hence its description costs bits.The total compressed code length is

and the bit rate (e.g., in bits per pixel) is given by

(1)

To formulate our main result, we introduce the generalizedShannon entropy defined as (cf. [15])

(2)

where : is a ball of radiuswith center , and denotes the expectation with respect

to .We prove below that the one-dimensional pattern matching

compression scheme achieves, asymptotically, the bit rate. This provides a rigorous justification for the one-di-

mensional-PMIC compression scheme implemented in [3].Following the theorem, we comment on the 2D-PMC scheme.In particular, we show that the bit rate for 2D-PMC lies between

and (however, we conjecture that the lowerbound is the correct bit rate).

Theorem 1: Let us consider the one-dimensional fixed data-base model with the database generated by a Markoviansource (with a positive transition matrix) and the source se-quence emitted by an independent Markovian source(also with a positive transition matrix).

i) The length of the first phrase satisfies

(pr.) (3)

More precisely, for any

(4)

ii) The average bit rate attains asymptotically

(5)

Proof: We observe that (4) follows directly from [15] forsome mixing sequences, and thus it applies to our situation. Weuse it in establishing lower and upper bounds for (5).

We start with the upper bound. We adopt the approach of[21] and its simplification proposed by [13]. Let us partition allphrases into “long” phrases and “short” phrases. A phraseislong if for any ; otherwise itis a short phrase. Let and be the sets of long and shortphrases, respectively. Observe that

(6)

Now we proceed as follows (below is a constant that canchange from line to line):

We now evaluate the expected value of the third term above.We shall write below for a match (phrase) that can occur in

any position. (Observe that , that is, has the samedistribution as .) In the following, by , we denote theindicator function

is short

is short


where the second inequality above follows by considering notjust but all possible positions on where a short match canoccur.

We now deal with the lower bound. Let be the set of thosephrases whose length is not bigger than .Clearly

where is the cardinality of . Thus, it suffices to provethat

(7)

to establish the desired lower bound. We start with the followingknown fact (cf. [1], [15], [18], [19], [22], [23]): For any ,we have for all

(8)

where is a positive constant. Indeed, the above is a con-sequence of for some

where is related to the generalized Rényi entropy of thefirst order (cf. [1], [15], and [18]). Now, let be the com-plementary set to . We call phrases in as “very long.”Observe that

(9)

Taking the expectation of both side of the above and settingin (8), we obtain

is very long

This establishes (7), and completes the proof of the theorem fora one-dimensional fixed database model.

Now, we are in a position to comment on the 2D fixed data-base model. The main difference lies in the fact that phrases inthe 2D model may overlap. We shall see that this only effectsthe upper bound. More precisely, let us carefully investigate thethree pillars of the one-dimensional analysis, namely, (4), (6),and (7). It is not difficult to see that (4) is valid in higher dimen-sions as long as is interpreted as volume (e.g., number ofpixels) of the repeated pattern. The lower bound on [cf.(7)] was based on (9) which is even more true when overlappingoccurs. Unfortunately, the upper bound (6) is not valid. Never-theless, in all our 2D implementations phrases cannot overlapmore than twice. Therefore, we can replace (6) with

This implies the upper bound of for the bit rate. In gen-eral, for the -dimensional fixed database model the bit rate

can be bounded as

for . We conjecture that the lower bound is thecorrect bit rate.

We should also point out that where isthe optimal rate distortion function. In [15] we observe, at leastfor memoryless sources, that and do not differ bymuch for moderate values of.

B. Growing Database Model

In contrast to video compression where a reference frame actsas the database, image compression uses the most recent partialcompressed image as a (variable) database. This can be modeledby thegrowing database model. Let us assume that the sourcesequence is generated according to a distribution.We set the initial reproduction sequence of lengthto be thesame as the first symbols of the source, that is, .We reveal to the decoder (e.g., in a lossless transmission).

Set and let . The algorithm works asfollows.

• Find the first match of length :

Encode pointer and the length . Then weenlarge the database

where implies concatenation. We sent in a losslessfashion to the decoder. Observe that we donotconcatenate

but thedistorteddatabase . In this

case, , as required.• Provided th reproduction sequence of length

was constructed, define the st match lengthas


and set . The corresponding code is. The enlarged database now becomes

that we also reveal in a lossless transmission to the de-coder. As before, by finding the largest match in thedis-torteddatabase, we assure that

for given .Our goal is to estimate theoperational bit rateat step de-

fined as

(10)

Observe that in the growing database model consecutivedatabases are nonstationary with, say,probability measures provided they exist.We present a heuristic analysis of the growing database modelunder the following hypothesis [H] that must be verifiedrigorously if one hopes for a precise analysis of the model.

[H] At step of the algorithm, let the database size bewith underlying probability . There exists a constant

that depends on such that in probability andon average

(11)

for .From the description of the algorithm, we have

hence after using [H] we arrive at

as long as . In view of this and (10), we obtain

Finally,

(12)

provided and .This approximate analysis relies crucially on the hypothesis

[H]. Its validity depends on how nonstationary the distorteddatabase is. Our preliminary theoretical investigations indicatethat one may construct a nonstationary database such that the

asymptotic of (11) does not hold (cf. Fig. 1), and one must workwith and . But, even if [H] holds, the compres-sion ratio (12) may still not converge as . Fig. 1 showsthe variation of the bit rate during 2D-PMIC (growing databasemodel). The right-hand side plot in Fig. 1 indicates that the av-erage bit rate may not converge to a limit.

III. A LGORITHMS, DATA STRUCTURES, AND IMPLEMENTATION

In this section, we describe algorithmic and implementationissues related to the compression scheme. The scheme relies ona range of techniques centered around 2D pattern matching usedin conjunction with variable adaptive distortion and run-lengthencoding. The key computational kernel is a-d tree basedmatching heuristic along with a region growing scheme thatidentifies maximal matches in an image. The corresponding de-compression scheme expands matched patterns and run-lengthencoded segments. Due to its simplicity, the decompressor runsextremely quickly and requires minimal computation. This iscrucial for many applications where one aims to display im-ages and video “on-the-fly” while downloading. This is partic-ularly important for video applications since current schemes(e.g., AVI, QuickTime, and MPEG) require significant softwaresupport on the decompression side.

The rest of this section addresses coding techniques based on2D matching, run-length encoding, and lossless encoding, dis-tortion metrics, and algorithmic complexities along with imple-mentation details for image and video streams.

A. Encoding Techniques

The compression scheme proposed here uses a combinationof three techniques to code data: 2D pattern matching, enhancedrun-length encoding (ERLE), and lossless coding. These tech-niques are applied to progressively code the image. Assumingthat a part of an image has been previously encoded, the keytask is to encode the pixels located to the right of and below apoint we call theanchor point[cf. Fig. 2(c)]. The selection of thenext anchor point is based on agrowing heuristic(cf. [4]). Thegrowing heuristic we adopt is the “wave-front” scheme. Thisscheme sweeps the image from top to bottom of the image. Theanchor point at an intermediate stage in compression of imageSan Francisco is illustrated in Fig. 2(c). Other heuristics grow re-gions in a circular manner from a center of the image or expandfrom the main diagonal. These heuristics have been observed toyield similar results [4]. Once the anchor point has been identi-fied, the partial image is coded using either 2D pattern matching,ERLE, or lossless coding. We observe that for all our growingheuristics phrases do not overlap more than twice.

1) Two-Dimensional Pattern Matching:Two-dimensionalpattern matching is the most efficient way of compressingimages among the methods we use in 2D-PMC. The basic ideais to find a 2D region (rectangle) in the uncompressed part ofthe image that occurs approximately in the compressed part(i.e., database), and to store a pointer to it along with the widthand the length of the repeated rectangle. This is illustrated inFig. 2(c). The notion of approximate occurrence is discussed ingreater detail in Section III-B along with distortion measures.Observe that the objective is to search for thelargest such


Fig. 1. Variation of compression rate as the pattern matching process proceeds (left column) and cumulative BPP of the compression process (right column).

area. Consequently, a brute force search algorithm is too timeconsuming. Indeed, if the database has pixels and theas-yet uncompressed region contains pixels, then thecomplexity of the algorithm is (cf. [3]). This is tooexpensive since we have to apply it on averagetimes on an image.

The problem of identifying maximal matches in a databaseat an anchor point can be accomplished in two stages: i) iden-tifying candidate seed points in the database at which potentialmatches can be found and ii) growing regions around seed pointsand selecting the one that yields best compression ratio with re-spect to selected distortion metric. Note that the selection of amatching region is a locally greedy process and does not guar-antee a globally optimal compression ratio.

To identify candidate seed points, we consider a template ofpixels around the anchor point. For example, assume that this

template is a 3 2 region with the anchor point as the top-leftpixel in the template. The resulting template corresponds to asix-dimensional (6D) vector. Fig. 3 illustrates the vectors for twotemplates of dimension 2 3 and 3 2. The problem of findingcandidate seed points then becomes one of finding all vectors ina 6D space that match these vectors at the anchor point in anapproximate sense. In the example in Fig. 3, this correspondsto finding all approximate instances of vectors [100, 63, 65, 63,105, 65] and [100, 63, 63, 105, 65, 18] and flagging them ascandidate seed points for pattern matches. In general, searchingfor a single template does not yield the best compression. Ourpattern matching scheme allows users to define multiple tem-


Fig. 2. Pattern matching and enhanced run-length encoding techniques inPMIC demonstrated for sample image San Francisco.

plates and generate candidate seed points for all of these tem-plates. Typical template sizes used for our compression resultsare: 2 3, 3 2, 1 5, and 5 1.

Search for template matches is performed using-d trees (cf.[8]and [14]).Aone-dimensional tree isanordinarybinarysearchtree; a 2D tree is similar to a one-dimensional tree, except, thenodes at even levels compare-coordinates and the nodes onodd-levels compare-coordinates when branching. In general,a -d tree has nodes with coordinates, and branching at eachlevel is based on only one of the coordinates; for example, wemight branch oncoordinate number on level . Ourimplementation of -d tree search resembles the idea proposedin [4]. To search for a -dimensional vector using-d trees, wetrace the path from the root down to one of the terminal nodes.The decision whether to go left or right at a node is performed on

Fig. 3. Two templates defined around an anchor point and corresponding 6Dvectors that must be searched in thek-d tree formed by the database.

the th coordinate, whereis the depth of the node. Thisis illustrated in Fig. 4 in the context of a 6D tree for searching avector . Since our 2D pattern matchingscheme allows the use of multiple templates, one such tree isconstructed for each template and various templates at each an-chor point are searched through their respective-d trees.

In our implementation we build a-d tree of bounded heightMaxDepthin which the mean value of the range is used forbranching (discriminating) at each node. For example, if the ini-tial range is , the first discriminating value is , thesecond one (for this coordinate) is (for the left sub-tree) and (for the right one), and so on (cf. Fig. 4where branching decisions at level seven onare 64 and 192respectively). The terminal nodes are calledbucketssince theycontain one or more leaves. Leaf nodes store data correspondingto the vector and coordinates of the associated pixels. To inserta leaf , we start from the root and follow the path until we finda bucket . There are three possibilities:

• Bucket is empty; if so, we insert into it.• The depth of this branch is already equal to the maximum

depthMaxDepth. In this case, we simply add to thebucket which is maintained as a chain of leaves.

• The depth of the node is less than the maximum depth, butthe associated bucket is full. In this case, we expand thenode to its children and select the branching decision asthe mean of the node range. All of the leaves in bucketare inserted into the buckets at corresponding children andthe original leaf is reinserted.

Since we search for an approximate occurrence of a tem-plate, we do arange searchon a -d tree with two arguments:a minimumvector and amaximumvector

. These vectors are determined by the dis-tortion measures discussed in Section III-B. We search for all

-dimensional vectors such that. This is done recursively by keeping

every subtree that may contain a solution and discarding theothers. For example, if the first discriminating value is 128, thenfor and we explore both left and right sub-trees of the root, while for and we just explorethe left subtree.

To get an idea of the complexity of this search we first reviewsome known results. Flajolet and Puech [7] present a detailedanalysis of random -d trees. Among other things, they prove


Fig. 4. A 6D-tree (range 0–255) with all left edges signifying a “less than” decision and right edges signifying a “greater than or equal to” decision.

that the average search time for a key in a-d tree built fromrandomly inserted nodes is

However, the situation is more intricate when some (say) co-ordinates are not specified. Then the search time increases to

where . If the -d tree is balanced withheight , then Lee and Wong [14] prove that a rangesearch of rectangle areas is bounded by , where

of the coordinates are restricted to subranges and there aresuch records. Finally, Friedmanet al. [8] showed that if the

search area is small and near cubical, then the search time is.

The -d tree search used here falls under the model of [8]. Toobtain a more precise estimate of the search time, let us boundthe number of paths explored per coordinate assuming the max-imum depth is , and the search range isfor every coordinate. We assume that the initial range is(e.g., [0, 255]). The maximum number of subranges that overlapwith the search range is

(13)

Thus, for a problem of dimension, we must search at mostbuckets. A bound for the maximum number of comparisons is

. For example, for , , ,and we have and . In view of this, thetotal search time is . Actual search performanceof a -d tree is illustrated by the following example:

• dimension: ;• database size: random vectors (eight

bits per coordinate);• maximum tree depth: 30 (five levels per coordinate);• solutions found: 72;

• search time: 2.7 ms (8.11 s for 3000 searches) on a PII300 MHz.

However, as mentioned previously, in order to find allpotential matches, we actually use several-d trees, each oneassociated with a different template. Furthermore, to reducethe number of potentially noninteresting solutions, pixels andtheir associated vectors are grouped by locations (by rectangleswhose typical size is 30 30) and one -d tree is used perlocation, so that we can perform a local search. Our implemen-tation of the -d tree follows the idea of [4]. The data structureused to represent a-d tree is a set of two arrays: the arrayof nodes, and the array of leaves. A cell, , is composedof two pointers and that can be used differently:

• if is equal to a special valueIsLeaf, then the node isactually a leaf, and the contents of the leaf can be found inthe cell pointed to by in ;

• else and are pointers to the left and right children.A NULL pointer represents an empty bucket.

The structure of a cell of is as follows:• the vector used to index the leaf;• a reference (i.e., the coordinates of the point associated

with the color vector);• a pointer to the next leaf. This is used to chain leaves if the

bucket contains more than one leaf.Finally, the procedure to determine the largest 2D match

works as follows: for each template, we extract the pixel vectorassociated with the anchor point and search for approximateoccurrences in the-d trees built over the pixel vectors of theneighboring locations. For each solution, we check if it hasalready been found before when searching for other templates.If not, we apply the same algorithm as for the enhancedrun-length encoding (discussed in Section III-A2) to find thebest matching shape. If the number of solutions exceeds acertain limit, we abort the search.


2) Enhanced Run-Length Encoding:Run-length encoding(RLE) of images identifies regions of the image with constantpixel values. We enhance RLE by giving it the capability ofcoding regions in which pixel values can be (approximately)modeled by a planar function. We call this techniqueenhancedrun-length encoding(ERLE). ERLE approximates in the least-squared error sense a given grid of pixels by a planarsurface. The coefficients of the planar surface are computedby solving a system of normal equations associated with theleast-squared error procedure. Once the planar surface is de-termined, a sub-segment of the grid that is within thedistortion distance is identified and coded using ERLE. We ob-serve that this is particularly useful for synthetic images typi-cally found on the Web.

To identify the largest subsegment of a given grid thatcan be coded using ERLE, we use the following procedure.

1) Initialize to an arbitrary constant, say .2) Test if the rectangle anchored at theanchor pointof size

can be encoded using ERLE within given distortiondistance.

3) If step 2 gives a positive answer, double.4) Let be the largest found through the above procedure.5) Apply the same strategy to find the largest width.6) If , then for to find by dichotomy the

greatest so that the rectangle can be ERL encoded.7) If , swap the role of the -coordinate and the

-coordinate in step 6 above.It is possible that a single anchor point may have a matching

pattern as well as an ERLE representation. In such cases, wecompare the cost (in terms of the number of bits used to storethe ERLE parameters versus the pattern matching parametersper pixel of compressed sub-image) and select the better one.Note that the number of bits used is calculated according to thedistribution model used by the lossless compressor discussed inSection III-A3. This important fact allows us to save some morememory during the lossless encoding phase. Another importantobservation relates to the fact that we might have overlappingcoded segments (i.e., pixels with multiple representations). Thisis not a problem since either representation of the pixel is withinspecified distortion bound. Furthermore, the pixel value is setto the value determined by the last coded instruction at the de-compressor side. This prevents segmentation of the image andartifacts such as specking.

3) Lossless Encoding:Once the lossy image compressionis completed, the image description is a script in the followingform: , , ,

, and so on where, , and

where the parameter set is different for each instruction. Here,indicates that the next pixels are sent without

compression, indicates that we use the ERLE,and refers to the 2D pattern matched encoding.The stream sent to the arithmetic encoder appears as follows:

Fig. 5. Typical histograms of (a) the height of matching areas and (b) of thecoefficientc in the ERLE coding that fits a planec +c x+c y on a subimage.

with different ranges and distributions for each instruction/pa-rameter.

Since the distribution of the parameter values is not uniform(cf. Fig. 5), we can further improve compression by losslesslyencoding this data. To make this compression efficient, a dis-tribution model is built for each parameter based on the previ-ously observed values and the stream is encoded using arith-metic coding. Since we want this compression method to besuitable for web-based applications, the instruction/parametersare multiplexed in the same arithmetic coding stream, so thatthe image/video can be displayed on-the-fly while downloading.We implement our own arithmetic encoder based on the ideasdiscussed in [12], hence we refrain from describing it in detail.

B. Distortion Measures

The process of lossy compression can be viewed as adding“noise” to the image. For this reason, we also refer to the dis-tortion level as . The original pattern matchingscheme PMIC [3] computed the distortion-per-symbol fora block of data (averaged over the block). This distortionmeasure was calleduniform distortion in [3]. We observedexperimentally, however, that uniform distortion leads to signif-icant perceptual errors. To avoid this, in 2D-PMC we adopt analternate approach in which we assign to eachindividualpixel arange(i.e., lower and upper bounds on the allowable distortionvalue per pixel) that depends on the values of surroundingpixels. The basic idea is to be less tolerant (i.e., have a smallrange) in slowly varying regions and more tolerant in quicklyvarying regions. We have experimentally evaluated severalapproaches and describe some of these.

1) Local Distortion Range:In order to reproduce more ac-curately slowly varying regions, we introduce differences be-tween an individual pixel value and its surrounding neighbors.The range assigned to the pixel located at is given by

where is the maximum distortion assignedin the uniform distortion method, is additional dis-tortion, and is the maximum of the absolutevalue of the differences between the pixel values at and


Fig. 6. Illustration of the global distortion range (pr strech = r).

its 8 neighbors. The drawback of this method is that it uses onlythe 8 closest pixels and consequently is very sensitive to noise.

2) Global Distortion Range:To counter ill effects of the lo-cality of the previous method, we introduce a function that as-signs a large range only when there are few pixels of similarvalue around it.

We now show how to compute the rangefor pixel lo-cated at . Define the weight as

where (cf. Fig. 6) is the radius of a circle sur-rounding pixel . Observe that , and its valuedecreases with the distance from. Denoting the pixel valueat location by , we define as follows:

and the total weight as

We compute the sum as

where and are constants. The selec-tion of and is such that we assure that

for all . The parametershould be close to 1 for slowly varying area (e.g., if forall , then ), and should be smaller than 1 for quicklychanging area. Finally, the range for pixel iscomputed as

where is chosen close to 1, and is a con-stant. Thus, the pixel value of can take values in theinterval .

We observe that the variation in, and hence the distribu-tion of errors in this method is less specky than the one com-puted with the local distortion. This method gives better visualresults than the previous one, because it hides the compression

TABLE ICOMPRESSION ANDDECOMPRESSIONTIMES (IN SECONDS) FOR FOUR IMAGES

ON A 400 MHz PENTIUM II

noise more efficiently. The difficulty in applying this methodlies in tuning the parameters , andin order to obtain the best visual quality of an image.

C. Overall Space and Time Complexity

We argue that the overall time complexity of the compressionalgorithm is on average and in the worstcase; the corresponding decompression time is [but onaverage only pointer reads are required], and thespace complexity is . [We will henceforth write forthe worst case complexity, and for the average case com-plexity.]

Let us first consider the compression algorithm. The mosttime consuming step is the two-dimensional pattern matchingthat is applied with high probability times sincewe proved in Theorem 1 that a typical repeated rectangle has

pixels. In the worst case we need to apply the 2D pat-tern matching search times. The search time in a-dtree is , hence the total time average-case complexityis and the worst-case complexity is . Un-fortunately, the constant in front of is large due to many po-tential searches allowed in the-d tree, that is, it can be as largeas where is defined in (13) (cf. Section III-A1). This con-stant can be reduced by limiting the search window for potentialseed points for pattern matching. Typical compression times forfour images are shown in Table I.

As mentioned previously, the decompression is extremelysimple and very fast. It requires on average onlyread operations, and it does not require any additional com-putation, if the arithmetic coding is not used. This is clearlyevidenced by the decompression times in Table I. The simpleand inexpensive decompression scheme allows one to send thedecompression program along with the data when downloadingimages from the web, obviating the need for decoding softwareon the decompression side. Such a JAVA decompressor isbeing implemented and preliminary results will be reported inforthcoming paper.

The space complexity is clearly . To get an idea ofactual memory requirements, we note that-d trees for “Lena,”“Basselope,” and “Banner” require 31.6 MB, 21 MB, and2 MB, respectively. While most current PC’s easily supportsuch memory requirements, these can be further improved. Ourimplementation maintains -d trees corresponding to entireimages. However, using the assumption that likely matches arefound in the vicinity of the anchor point, we can reduce thespace requirement considerably.


TABLE IIVIDEO “TRAIN” WITH AND WITHOUT MODEL PRE-FILTERING

D. Video Implementation

Pattern matching video compression uses a decompressedrepresentation of the previous frame that has been compressedin our framework as the database to compress current frame.We refer to this decompressed representation as a lossy image.The reason for using the lossy image as a database is that wewant the decompressed image at the client side to be withina distortion bound from the original image. Since the decom-pressor only knows the compressed frame (in its compressedand uncompressed form), this must be used as the databasefor pattern matching compression. In contrast, if the originalprevious frame was used as the database, this database is notavailable at the decompressor. Consequently, errors propagatequickly through subsequent frames during decompression.Since the previous compressed frame (in its uncompressedform) forms a static database, the compression process ismodeled well by our analytical framework of Section II-A.

Applying this framework directly to video compression, itwas noticed that the matched patterns were often quite smallleading to low compression ratios. This is a consequence of ap-plying distortion (i.e., approximate pattern matching) toeachin-dividual pixel instead of averaging over the matched area. To cir-cumvent this problem, we apply pattern matching to a low-passfiltered version of the image. In principle, to decide whether ornot two areas are alike, we compare their filtered versions. Ifwe find a good match in the filtered image, we apply it to thenonfiltered image. Experimental results indicate that for somesamples of video for which the compression did not work well[e.g., compression ratio (CR)4] this method could improvethe performance enormously (e.g., video) withoutnoticeable loss in quality. Table II compares the compressionwith and without the low-pass filter.

To implement this idea, we keep two copies of the previousframe. One as it has been encoded, and another that is a filteredversion. We also keep at hand a plain and a filtered version ofthe image being encoded. We populate the-d trees with tem-plates from the filtered version of the previous image. Whentrying to find a match, we search-d trees for a template thatis at the location pointed to by the anchor point in the filteredimage. On finding a match, we store a pointer to it. Notice, how-ever, that during the decoding phase, since the decoder knowsnothing about the filtering, it will make a copy of the nonfilteredpixels. Thus, the image reproduced is still sharp, even if we usea smoothed version to determine “motion vectors.”

The low-pass filtered image is computed using a convolutionmatrix defined as follows:

TABLE IIICOMPARISON OFCOMPRESSIONRESULTS FROM 2D-PMIC AND JPEG

FOR DIFFERENTIMAGES

That is, the new pixel values of the smoothed image become

where are pixel values of the original image.The filters we have tried are as follows.

• A low-pass filter with the frequency response

The convolution matrix for this filter is

• A low-pass filter with the frequency response

The convolution matrix in this case is

• The same as above, but with matrixof dimension five.We observed that the second filter yields the best performance.


Fig. 7. Comparison of image quality using 2D-PMIC and JPEG compression.

IV. EXPERIMENTAL RESULTS

In this section, we present some experimental results forimage and video compression. Pattern matching image com-pression is compared to JPEG compression in Table III andFigs. 7 and 8. Several observations can be made from this table.

• For synthetic images (Banner and Basselope), 2D-PMICoutperforms JPEG both in terms of compression ratios aswell as quality. This is also evident from Fig. 7. A compar-ison between 2D-PMIC and wavelet compression showssimilarly the superiority of 2D-PMIC [3].

• For compression of general images, 2D-PMIC is compet-itive with JPEG and wavelet compression up to 0.25 bpp.But in Fig. 8 2D-PMC seems to be (perceptually) supe-rior to JPEG for San Francisco for 0.5 bpp even thoughPSNR is better for JPEG image (24.3 dB versus 23.5 dBfor 2D-PMC).

• At very low bit rates, frequency domain methods are stillbetter than 2D-PMC for natural images.

These results indicate that for synthetic images and higher bitrates, 2D-PMC is the method of choice, particularly because ofthe simplicity of the decompression procedure.

In Table IV we show some results of video compressionfor two video clips, namely “Claire” and “Ping_Pong.” Wecompress them using 2D-PMC with the previous uncom-pressed frame as the database. A demo is set up at http://www.cs.purdue.edu/homes/spa/Compression/2D-PMC.html andhttp://www.cs.purdue.edu/homes/ayg/Video/ for the reader’sevaluation. Table IV presents results for “low-quality” (lq)and “high-quality” (hq) compression. It is clear from the tablethat 2D-PMC yields excellent compression ratios and qualitywhile supporting very high decompression rates (speed ofdecompression). Indeed, by augmenting this technique withinter-frame prediction (in a manner similar to MPEG), theperformance can be further enhanced. The objective herehowever is to demonstrate the power of the pattern matchingframework upon which the rest of the conventional compressionmachinery can build. We have recently implemented a JAVAdecompressor that can be transmitted with the compressed

Fig. 8. San_Francisco and Lena at 0.5 bpp: Comparison of image quality usingJPEG and 2D-PMC compression.

video obviating the need for installation of decompressionclients (cf. http://www.cs.purdue.edu/homes/ayg/Video/).


TABLE IVVIDEO COMPRESSIONRESULTS: LOW QUALITY (LQ) AND HIGH QUALITY (HQ)

V. CONCLUDING REMARKS

In this paper, we have presented a 2D pattern matching frame-work for compression of image and video data. The followingconclusions can be made.

• 2D-PMIC yields excellent compression ratios and is com-petitive with JPEG and wavelet based compression upto 0.25 bpp. 2D-PMIC works better on synthetic imageswhile transform-based methods work better on natural im-ages. In general, 2D-PMIC works better when high fre-quency components dominate the image;

• 2D-PMC is slower in terms of compression time but ex-tremely fast and simple in terms of decompression. De-compression can be done on-fly with minimal softwaresupport at the client.

• 2D-PMC naturally adapts to variable resource availabilityby changing the distortion bound. This results in highercompression rates as well as faster compression times.

• The pattern matching framework of 2D-PMC can be usedin conjunction with many of the current techniques toleverage the benefits of these schemes.

We are currently working on improving the pattern matchingframework and combining it with other conventional techniquesbased on frequency domain methods. More work is neededon automatic selection of parameters for the pattern matchingscheme. 2D-PMC for video is particularly promising, and weare exploring it actively as a part of our ongoing research.When applied in conjunction with prediction and frequencytransforms, our framework has the potential to yield signifi-cantly improved performance. Alternately, the pattern matchingframework can itself be used as a homogeneous compressiontechnique for textual, audio, image, and video data leading to auniform multimedia compression standard.

ACKNOWLEDGMENT

The authors would like to acknowledge valuable discussionswith I. Kontoyiannis (Brown University) and Y. Reznik (Re-alNetwork, Inc.). They also would like to thank two refereeswhose comments were valuable in improving the presentation.

REFERENCES

[1] R. Arratia and M. Waterman, “The Erdös–Rényi strong law for patternmatching with given proportion of mismatches,”Ann. Prob., vol. 17, pp.1152–1169, 1989.

[2] D. Arnaud and W. Szpankowski, “Pattern matching image compressionwith prediction loop: Preliminary experimental results,” inProc. DataCompression Conf., Snowbird, UT, 1997.

[3] M. Atallah, Y. Génin, and W. Szpankowski, “A pattern matching imagecompression: Algorithmic and empirical results,”IEEE Trans. PatternAnal. Machine Intell., vol. 21, pp. 618–627, July 1999.

[4] C. Constantinescu and J. A. Storer, “Improved techniques for single-passadaptive vector quantization,”Proc. IEEE, vol. 82, pp. 933–939, 1994.

[5] A. Dembo and I. Kontoyiannis, “The asymptotics of waiting times be-tween stationary processes, allowing distortion,”Ann. Appl. Prob., vol.9, pp. 413–429, 1999.

[6] W. Finamore, M. Carvalho, and J. Kieffer, “Lossy compression withthe Lempel–Ziv algorithm,” inProc. 11th Brasilian TelecommunicationConf., 1993, pp. 141–146.

[7] P. Flajolet and P. Puech, “Partial match retrieval of multidimensionaldata,”J. ACM, vol. 33, pp. 371–407, 1986.

[8] J. H. Friedman, J. Bentley, and R. Finkel, “An algorithm for finding bestmatches in logarithmic expected time,”ACM Trans. Math. Softw., vol.3, pp. 209–226, 1977.

[9] A. Gersho and R. Gray,Vector Quantization and Signal Compres-sion. Norwell, MA: Kluwer, 1992.

[10] J. Gibson, T. Berger, T. Lookabaugh, and R. Baker,Multimedia Com-pression: Applications & Standards. San Mateo, CA: Morgan Kauf-mann, 1998.

[11] P. G. Howard, “Lossless and lossy compression of text images by softpattern matching,” inProc. Data Compression Conf., Snowbird, UT,1996, pp. 210–219.

[12] P. Howard and J. Vitter, “Analysis of arithmetic coding for data com-pression,” inProc. Data Compression Conf., Snowbird, UT, 1991, pp.3–12.

[13] I. Kontoyiannis, “An implementable lossy version of the Lempel–Ziv al-gorithm—Part I: Optimality for memoryless sources,”IEEE Trans. In-form. Theory, vol. 45, pp. 2285–2292, 1999.

[14] D. T. Lee and C. K. Wong, “Worst-case analysis for region and partialregion searches in multidimensional binary search trees and balancedquad trees,”Acta Inform., vol. 9, pp. 23–29, 1977.

[15] T. Łuczak and W. Szpankowski, “A suboptimal lossy data compressionbased in approximate pattern matching,”IEEE Trans. Inform. Theory,vol. 43, pp. 1439–1451, Sept. 1997.

[16] M. Rabbani and P. Jones,Digital Image Compression Tech-niques. Bellingham, WA: SPIE, 1991.

[17] Y. Steinberg and M. Gutman, “An algorithm for source coding subjectto a fidelity criterion, based on string matching,”IEEE Trans. Inform.Theory, vol. 39, pp. 877–886, 1993.

[18] W. Szpankowski, Average Case Analysis of Algorithms on Se-quences. New York: Wiley, 2001.

[19] A. J. Wyner, “The redundancy and distribution of the phrase lengths ofthe fixed-database Lempel–Ziv algorithm,”IEEE Trans. Inform. Theory,vol. 43, pp. 1439–1465, Sept. 1997.

[20] A. Wyner and J. Ziv, “Fixed data base version of the Lempel–Ziv datacompression algorithm,”IEEE Trans. Inform. Theory, vol. 37, pp.878–880, 1991.

[21] , “The sliding window Lempel–Ziv algorithm is asymptotically op-timal,” Proc. IEEE, vol. 82, pp. 872–877, 1994.

[22] E. H. Yang and J. Kieffer, “On the redundancy of the fixed-databaseLempel–Ziv algorithm for�-mixing sources,”IEEE Trans. Inform.Theory, vol. 43, pp. 1101–1111, July 1997.

[23] , “On the performance of data compression algorithms based uponstring matching,”IEEE Trans. Inform. Theory, vol. 44, pp. 47–65, Jan.1998.

[24] R. Zamir and K. Rose, “Natural type selection in adaptive lossy com-pression,”IEEE Trans. Inform. Theory, vol. 47, pp. 99–111, Jan. 2001.

[25] Z. Zhang and V. Wei, “An on-line universal lossy data compression algo-rithm via continuous codebook refinement—Part I: Basic results,”IEEETrans. Inform. Theory, vol. 42, pp. 803–821, May 1996.

[26] J. Ziv and A. Lempel, “A universal algorithm for sequential data com-pression,”IEEE Trans. Inform. Theory, vol. IT-23, no. 3, pp. 337–343,1977.

[27] , “Compression of individual sequences via variable-rate coding,”IEEE Trans. Inform. Theory, vol. IT-24, pp. 530–536, 1978.

Marc Alzina graduated from Ecole Polytechnique,France, in 1998, and Ecole Nationale Supérieure desTélécommunications in 2000. He studied images andsignal processing, some aspects of networking suchas services and architectures, digital switching, asyn-chronous networks, queuing theory, as well as dis-tributed operating systems, open systems and inter-communication.

He worked on IP stack enhancements in an em-bedded O.S. for set-top boxes at OpenTV, MountainView, CA. He is currently working for a start-up on

specification and development of services in IP-based telephone networks. Hisresearch interests cover lossy data compression.


Wojciech Szpankowski(M’87–SM’95) received theM.S. and Ph.D. degrees in electrical and computerengineering from the Technical University, Gdan´sk,Poland, in 1976 and 1980, respectively.

Currently, he is a Professor of computer sciencesat Purdue University, West Lafayette, IN. Beforecoming to Purdue, he was an Assistant Professorat the Technical University of Gdan´sk. In 1984,he held a Visiting Assistant Professor position atMcGill University, Montreal, QC, Canada. During1992/1993 he was Professeur Invité at the Institut

National de Recherche en Informatique et en Automatique, France. In fall1999, he was a Visiting Professor at Stanford University, Stanford, CA. Hisresearch interests cover analysis and design of algorithms, data compression,information theory, performance evaluation, networking, multiaccess protocols,queuing theory, stability problems in stochastic models, probability theory,and analytic combinatorics. On these topics he has authored or co-authoredover 100 papers. He wrote the bookAverage Case Analysis of Algorithms onSequences(New York: Wiley, 2001). His recent work is mostly devoted toprobabilistic analysis and design of algorithms on strings and applications ofanalytic tools to problems of information theory. He was guest editor for severaljournals. Currently, he co-edits a special issue on the analysis of algorithmsin Random Structures & Algorithms. He is on the editorial boards ofDiscreteMathematics and Theoretical Computer Scienceand Theoretical ComputerScience. In 1999, he co-chaired theInformation Theory and NetworkingWorkshop, Metsovo, Greece; in 2000 he chaired theSixth Seminar on Analysisof Algorithms, Krynica Morska, Poland.

Ananth Grama received the Ph.D. degree in com-puter sciences from the University of Minnesota,Minneapolis, in 1996.

He is currently an Associate Professor of computersciences at Purdue University, West Lafayette, IN.His research interests span the areas of parallel anddistributed computing infrastructure and applica-tions, large-scale data analysis and handling, and datacompression. He has co-authored several papers andthe textbookIntroduction to Parallel Computing—Design and Analysis of Algorithms(Redwood City,

CA: Benjamin Cummings/Addison Wesley, 1994) on related topics.Dr. Grama is the recipient of several awards including the Vice Chancellor’s

Gold Medal (University of Roorkee, India, 1989), the Doctoral DissertationAward (University of Minnesota, Minneapolis, 1993), the NSF CAREER Award(1998), and the Purdue University School of Science Outstanding Assistant Pro-fessor Award (1999).

Documents

2D-pattern matching image and video compression: theory, algorithms, and experiments