12
Math.Comput.Sci. (2013) 7:155–166 DOI 10.1007/s11786-013-0153-x Mathematics in Computer Science Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations Luigi Cinque · Sergio De Agostino · Luca Lombardi Received: 31 December 2012 / Accepted: 26 April 2013 / Published online: 23 May 2013 © Springer Basel 2013 Abstract We present a method for compressing binary images via monochromatic pattern substitution. Such method has no relevant loss of compression effectiveness if the image is partitioned into up to a thousand blocks, approximately, and each block is compressed independently. Therefore, it can be implemented on a distributed system with no interprocessor communication. In the theoretical context of unbounded parallelism, interprocessor communication is needed. Compression effectiveness has a bell-shaped behaviour which is again competitive with the sequential performance when the highest degree of parallelism is reached. Finally, the method has a speed-up if applied sequentially to an image partitioned into up to 256 blocks. It follows that such speed-up can be applied to a parallel implementation on a small scale system. Keywords Lossless compression · Binary image · Distributed system · Scalability Mathematics Subject Classification (2000) Distributed Algorithms 68W15 · Image Processing 94A08 1 Introduction The best lossless image compressors are enabled by arithmetic encoders based on the model driven method [3]. The model driven method consists of two distinct and independent phases: modeling [4] and coding [5]. Arithmetic encoders are the best model driven compressors both for binary images (JBIG) [6] and for grey scale and color images (CALIC) [7], but they are often ruled out because they are too complex. As far as the model driven method for grey scale and color image compression is concerned, the modeling phase consists of three components: the Preliminary results were presented in [1] and [2]. L. Cinque · S. De Agostino (B ) Computer Science Department, Sapienza University, Via Salaria 113, 00198 Rome, Italy e-mail: [email protected] L. Cinque e-mail: [email protected] L. Lombardi Computer Science Department, University of Pavia, Via Ferrara 1, 27100 Pavia, Italy e-mail: [email protected]

Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Embed Size (px)

Citation preview

Page 1: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Math.Comput.Sci. (2013) 7:155–166DOI 10.1007/s11786-013-0153-x Mathematics in Computer Science

Binary Image Compression via Monochromatic PatternSubstitution: Sequential and Parallel Implementations

Luigi Cinque · Sergio De Agostino ·Luca Lombardi

Received: 31 December 2012 / Accepted: 26 April 2013 / Published online: 23 May 2013© Springer Basel 2013

Abstract We present a method for compressing binary images via monochromatic pattern substitution. Suchmethod has no relevant loss of compression effectiveness if the image is partitioned into up to a thousand blocks,approximately, and each block is compressed independently. Therefore, it can be implemented on a distributedsystem with no interprocessor communication. In the theoretical context of unbounded parallelism, interprocessorcommunication is needed. Compression effectiveness has a bell-shaped behaviour which is again competitive withthe sequential performance when the highest degree of parallelism is reached. Finally, the method has a speed-upif applied sequentially to an image partitioned into up to 256 blocks. It follows that such speed-up can be appliedto a parallel implementation on a small scale system.

Keywords Lossless compression · Binary image · Distributed system · Scalability

Mathematics Subject Classification (2000) Distributed Algorithms 68W15 · Image Processing 94A08

1 Introduction

The best lossless image compressors are enabled by arithmetic encoders based on the model driven method [3].The model driven method consists of two distinct and independent phases: modeling [4] and coding [5]. Arithmeticencoders are the best model driven compressors both for binary images (JBIG) [6] and for grey scale and colorimages (CALIC) [7], but they are often ruled out because they are too complex. As far as the model driven methodfor grey scale and color image compression is concerned, the modeling phase consists of three components: the

Preliminary results were presented in [1] and [2].

L. Cinque · S. De Agostino (B)Computer Science Department, Sapienza University, Via Salaria 113, 00198 Rome, Italye-mail: [email protected]

L. Cinquee-mail: [email protected]

L. LombardiComputer Science Department, University of Pavia, Via Ferrara 1, 27100 Pavia, Italye-mail: [email protected]

Page 2: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

156 L. Cinque et al.

determination of the context of the next pixel, the prediction of the next pixel and a probabilistic model for theprediction residual, which is the value difference between the actual pixel and the predicted one. In the coding phase,the prediction residuals are encoded. The use of prediction residuals for grey scale and color image compressionrelies on the fact that most of the times there are minimal variations of color in the neighbourhood of one pixel.LOCO-I (JPEG-LS) [8] is the current lossless standard in low-complexity applications and involves Golomb–Ricecodes [9,10] rather than the arithmetic ones. A low-complexity application compressing 8x8 blocks of a grey-scaleor color image by means of a header and a fixed-length code is PALIC [11], which can be implemented on anarbitrarily large scale system at no communication cost. As explained in [11], parallel implementations of LOCO-Iwould require more sophisticated architectures than a simple array of processors. PALIC achieves 80 percent ofthe compression obtained with LOCO-I and is an extremely local procedure which is able to achieve a satisfyingdegree of compression by working independently on different very small blocks. On the other hand, no locallow-complexity binary image compressor has been designed so far. BLOCK MATCHING [12,13] is the best low-complexity compressor for binary images and extends data compression via textual substitution to two-dimensionaldata by compressing sub-images rather than substrings [14,15], achieving 80 percent of the compression obtainedwith JBIG.

We present a method for compressing binary images via monochromatic pattern substitution. Monochromaticrectangles inside the image are compressed by a variable length code. Such monochromatic rectangles are detectedby means of a raster scan (row by row). If the 4 x 4 subarray in position (i, j) of the image is monochromatic,then we compute the largest monochromatic rectangle in that position else we leave it uncompressed. The encodingscheme is to precede each item with a flag field indicating whether there is a monochromatic rectangle or rawdata. The procedure for computing the largest monochromatic rectangle with left upper corner in position (i, j)takes O(M log M) time, where M is the size of the rectangle. The positions covered by the detected rectanglesare skipped in the linear scan of the image and the sequential time to compress an image of size n by rectanglematching is �(n log M). The analysis of the running time of this algorithm involves a waste factor, defined as theaverage number of matches covering the same pixel. We experimented that the waste factor is less than 2 on realisticimage data. Therefore, the heuristic takes O(n log M) time. On the other hand, the decoding algorithm is linear. Thecompression effectiveness of this technique is about the same as the one of the rectangular block matching technique[13], which still requires �(n log M) time. Moreover, compression via monochromatic pattern substitution has norelevant loss of effectiveness if the image is partitioned into up to a thousand blocks, approximately, and each blockis compressed independently. Therefore, the computational phase can be implemented on both small and largescale distributed systems with no interprocessor communication. BLOCK MATCHING, instead, does not worklocally since it applies a generalized LZ1-type method with an unrestricted window [12–14]. We experimented theprocedure with up to 32 processors of the 256 Intel Xeon 3.06 GHz processors machine avogadro.cilea.it (http://www.supercomputing.it) on a test set of large topographic bi-level images. We obtained the expected speed-upof the compression and decompression times, achieving parallel running times about twenty times faster than thesequential ones. Similar results were obtained in [16] on the same machine for the version of block matchingcompressing squares [12]. However, the square block matching technique has a lower effectiveness and it is notscalable.

In the theoretical context of unbounded parallelism, interprocessor communication is needed when we scale upthe distributed system. We obtained experimental results on the compression achieved in this theoretical contextsimply by sequentializing the procedure. It resulted that the compression effectiveness has a bell-shaped behaviourwhich is again competitive with the sequential performance when the highest degree of parallelism is reached.

Finally, we show that such method has a speed-up if applied sequentially to the partitioned image. Experimentalresults suggest that the speed-up happens if the image is partitioned into up to 256 blocks and sequentially eachblock is compressed independently. It follows that the speed-up can also be applied to a parallel implementation on asmall scale system. Such speed-up depends on the fact that monochromatic rectangles crossing boundaries betweenblocks are not computed and, consequently, the waste factor decreases when the number of blocks increases. Werefine the partition by splitting the blocks horizontally and vertically and, after four refinements, experimentationsshow that no further improvement is obtained.

Page 3: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Binary Image Compression via Monochromatic Pattern Substitution 157

Previous work on the square block matching heuristic is reported in Sect. 2. Compression via monochromaticpattern substitution is described in Sect. 3. Section 4 presents the experimental results on the parallel machine. InSect. 5, we show how parallel implementations can be realized in the theoretical context of unbounded parallelismby means of interprocessor communication. Section 6 presents the experimental results relating scalability tocompression effectiveness. Section 7 presents the experimental results of the sequential speed-up. The experimentalresults of the speed-up applied to small scale parallel computation are shown in Sect. 8. Conclusions and futurework are given in Sect. 9.

2 Previous Work

As mentioned in the introduction, the square block matching procedure has been so far the only low-complexitylossless binary image compression technique implemented in parallel at no communication cost [16]. Such tech-nique is a two-dimensional extension of Lempel–Ziv compression [14] and simple practical heuristics exist toimplement it by means of hashing techniques [17–20]. The hashing technique used for the two-dimensional exten-sion is even simpler [12]. The square matching compression procedure scans an m × m′ image by a raster scan.A table of 216 − 2 entries with one position for each non-monochromatic 4 × 4 subarray is the only data struc-ture used. All-zero and all-one squares are handled differently. The encoding scheme is to precede each itemwith a flag field indicating whether there is a monochromatic square, a match, or raw data. When there is amatch, the 4 × 4 subarray in the current position is hashed to yield a pointer to a copy. This pointer is usedfor the current greedy match and then replaced in the hash table by a pointer to the current position. The pro-cedure for computing the largest square match with left upper corners in positions (i, j) and (k, h) takes O(M)

time, where M is the size of the match. Obviously, this procedure can be used to compute the largest mono-chromatic square in a given position (i, j) as well. If the 4 × 4 subarray in position (i, j) is monochromatic,then we compute the largest monochromatic square in that position. Otherwise, we compute the largest matchin the position provided by the hash table and update the table with the current position. If the subarray is nothashed to a pointer, then it is left uncompressed and added to the hash table with its current position. The posi-tions covered by matches are skipped in the linear scan of the image. We wish to point out that besides theproper matches we call a match every square of the parsing of the image produced by the heuristic. We also callpointer the encoding of a match. Each pointer starts with a flag field indicating whether there is a monochro-matic match (0 for the white ones and 10 for the black ones), a proper match (110) or a raw match (111). Thecomplexity of the square block matching technique is linear but slower in practice than monochromatic patternsubstitution.

An image can be partitioned into up to a hundred areas, approximately, and the square block matching heuristiccan be applied independently to each area with no relevant loss of compression effectiveness on both the CCITTbi-level image test set (Fig. 1) and the set of five 4, 096 × 4, 096 pixels half-tone images of Figs. 2 and 3. In fact,independently from the image size, an image partition into about one hundred areas defines sub-images where theorder of magnitude of the length of each side is decreased just by one unit. This allows the parallel approach stillto compute the monochromatic matches detected by the sequential procedure with a very good approximation.However, square matches do not allow a higher scale of parallelism because the sub-images would be too smallto guarantee robust compression. On the other hand, with the approach we present in the next section, employingrectangle monochromatic matches and a variable length code, even a partition of the image into, approximately,one thousand elements guarantees a robust parallel approach to compression.

To conclude, in order to implement decompression on an array of processors, we needed to indicate the end ofthe encoding of a specific area. So, we changed the encoding scheme by associating the flag field 1110 to the rawmatch so that 1111 could indicate the end of the sequence of pointers corresponding to a given area. Then, the flagfield 1110 is followed by sixteen bits uncompressed, flag fields 0 and 10 by the size of the square side while flagfield 110 is also followed by the position of the matching copy. The values following the flag fields are representedby a fixed length code.

Page 4: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

158 L. Cinque et al.

Fig. 1 The CCITT imagetest set

Fig. 2 Images 1–5 from left to right

Page 5: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Binary Image Compression via Monochromatic Pattern Substitution 159

Fig. 3 Compression (left) and decompression (right) times of the block matching procedure (cs.)

Experiments were run on the CCITT bi-level image test set (Fig. 1) and on the set of five 4, 096 × 4, 096 pixelshalf-tone images of Fig. 2. The compression and decompression times of the parallel procedure applied to the fiveimages of Fig. 2, doubling up the number of processors of the avogadro.cilea.it machine from 1 to 32 are shown inFig. 3 [16].

3 Compression via Monochromatic Pattern Substitution

The technique scans an m × m′ image by a raster scan. If the 4 × 4 subarray in position (i, j) of the image ismonochromatic, then we compute the largest monochromatic rectangle in that position. We denote with pi, j thepixel in position (i, j). The procedure for finding the largest rectangle with left upper corner (i, j) is describedin Fig. 4. At the first step, the procedure computes the longest possible width for a monochromatic rectangle in(i, j) and stores the color in c. The rectangle 1 × � computed at the first step is the current detected rectangleand the sizes of its sides are stored in side1 and side2. In order to check whether there is a better match than thecurrent one, the longest sequence of consecutive pixels with color c is computed on the next row starting fromcolumn j . Its length is stored in the temporary variable width and the temporary variable length is increased byone. If the rectangle R whose sides have size width and length is greater than the current one, the current oneis replaced by R. We iterate this operation on each row until the area of the current rectangle is greater or equalto the area of the longest feasible width-wide rectangle, since no further improvement would be possible at thatpoint. For example, in Fig. 5 we apply the procedure to find the largest monochromatic rectangle in position (0,0).A monochromatic rectangle of width 6 is detected at step 1. Then, at step 2 a larger rectangle is obtained whichis 2 × 4. At step 3 and step 4 the current rectangle is still 2 × 4 since the longest monochromatic sequence onrow 3 and 4 comprises two pixels. At step 5, another sequence of two pixels provides a larger rectangle which is5 × 2. At step 6, the procedure stops since the longest monochromatic sequence is just one pixel and the rectanglecan cover at most 7 rows. It follows that the detected rectangle is 5 × 2 since a rectangle of width 1 cannot have alarger area. Such procedure for computing the largest monochromatic rectangle in position (i, j) takes O(M log M)

time, where M is the rectangle size. In fact, in the worst case a rectangle of size M could be detected on row i ,a rectangle of size M/2 on row i + 1, a rectangle of size M/3 on row i + 2 and so on. Since the rectangle sizeis the product of the width and length, this worst case process ends with a rectangle of size 1 on row i + M andthe number of pixels that have been read are given by M times the M-th harmonic sum. Therefore, such case is�(M log M).

If the 4 × 4 subarray in position (i, j) of the image is not monochromatic, we do not expand it. The positionscovered by the detected rectangles are skipped in the linear scan of the image. The encoding scheme for suchrectangles uses a flag field indicating whether there is a monochromatic match (0 for the white ones and 10 for theblack ones) or not (11). If the flag field is 11, it is followed by the sixteen bits of the 4 × 4 subarray (raw data).Otherwise, we bound by twelve the number of bits to encode either the width or the length of the monochromaticrectangle. We use either four or eight or twelve bits to encode one rectangle side. Therefore, nine different kinds ofrectangle are defined. A monochromatic rectangle is encoded in the following way:

Page 6: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

160 L. Cinque et al.

Fig. 4 Computing the largest monochromatic rectangle match in (i, j)

Fig. 5 The largestmonochromatic match in(0,0) is computed at step 5

• the flag field indicating the color;• three or four bits encoding one of the nine kinds of rectangle;• bits for the length and the width.

Four bits indicate when the length and the width are both encoded by twelve bits or by eight and twelve bits,respectively. Three bits indicate the other seven classes. This classification plays a relevant role for the compressionperformance since a variable length code is used rather than a code with a fixed length equal to twelve bits. Infact, it wastes four bits when twelve bits are required for the sides but saves four to twelve bits when four or eightbits suffice, which is the most frequent case on realistic data. Such variable length coding technique was checkedexperimentally as the most suitable and presented in [21] for the rectangular block matching technique.

Page 7: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Binary Image Compression via Monochromatic Pattern Substitution 161

4 Parallel Implementations

The variable length coding technique explained in the previous section has been applied to the CCITT test setand has provided a compression ratio equal to 0.13 in average, where the ratio is given by size of the compresseddata over the size of the raw image. The images of the CCITT test set are 1728 × 2376 pixels. If these imagesare partitioned into 4k sub-images and the compression heuristic is applied independently to each sub-image, thecompression effectiveness remains about the same for 1 ≤ k ≤ 5 with a 1 percent loss for k = 5. Raw data areassociated with the flag field 110, so that we can indicate with 111 the end of the encoding of a sub-image. Fork = 6, the compression ratio is still just a few percentage points of the sequential one (Fig. 7). This is because thesub-image is 27 × 37 pixels and it still captures the monochromatic rectangles which belong to the class encodedwith four bits for each dimension. These rectangles are the most frequent and give the main contribution to thecompression effectiveness. On the other hand, the compression ratio of the square block matching heuristic whenapplied to the CCITT set is about 0.16 for 1 ≤ k ≤ 3 and it deteriorates relevantly for k ≥ 4.

The compression effectiveness of the variable-length coding technique depends on the sub-image size ratherthan on the whole image. In fact, if we apply the parallel procedure to the test set of larger binary images as the4096 × 4096 pixels half-tone topographic images of Fig. 2 we obtain about the same compression effectivenessfor 1 ≤ k ≤ 5 (Fig. 7). The compression ratio is 0.28 with a 2 percent loss for k = 6. This means that actuallythe approach without interprocessor communication works in the context of unbounded parallelism as long as theelements of the image partition are large enough to capture the monochromatic rectangles encoded with four bitsfor each dimension. On the other hand, the compression ratio of the square block matching heuristic is 0.31 anddepends on how the match size compares to the whole image. In fact, the compression effectiveness is again aboutthe same for 1 ≤ k ≤ 3 and deteriorates in a relevant way for k ≥ 4.

We wish to remark at this point that parallel models have two types of complexity, the interprocessor commu-nication and the input–output mechanism. While the input/output issue is inherent to any parallel algorithm andhas standard solutions, the communication cost of the computational phase after the distribution of the data amongthe processors and before the output of the final result is obviously algorithm-dependent. So, we need to limitthe interprocessor communication and involve more local computation to design a practical algorithm. Obviously,the compression and decompression procedures described here do not require any interprocessor communicationduring the computational phase and are implementable on both small (about 100 processors) and large (about 1,000processors) scale distributed systems. We show in Fig. 6 the compression and decompression times of the parallelprocedure applied to the five images of Fig. 2, doubling up the number of processors of the avogadro.cilea.it machinefrom 1 to 32. We executed the compression and decompression on each image several times. The variances of boththe compression and decompression times were small and we report the greatest running times, conservatively. Asit can be seen from the values in the tables, also the variance over the test set is quite small. The decompressiontimes are faster than the compression ones and in both cases we obtain the expected speed-up, achieving runningtimes about twenty times faster than the sequential ones. Images 3 and 4 have the greatest sequential compressiontime, while image 2 has the smallest one. Images 3 and 4 also have the greatest sequential decompression time. The

Fig. 6 Compression (left) and decompression (right) times of the new procedure (cs.)

Page 8: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

162 L. Cinque et al.

greatest compression time with 32 processors is given by image 5 while the decompression time with 32 processorsis the same for all images.

5 A Massively Parallel Algorithm

The compression and decompression parallel procedures presented here require O(α log M) time with O(n/α)

processors for any integer square value α ∈ �(log n) on a PRAM EREW and are similar to the ones in [16] forthe rectangular block matching heuristic. As in the previous section, we partition an m × m′ image I in x × yrectangular areas, where x and y are �(α1/2). In parallel for each area, one processor applies the sequentialparsing algorithm so that in O(α log M) time each area is parsed into rectangles, some of which are mono-chromatic. In the theoretical context of unbounded parallelism, α can be arbitrarily small and before encodingwe wish to compute larger monochromatic rectangles. The monochromatic rectangles are computed by mergingadjacent monochromatic areas without considering those monochromatic matches properly contained in somearea. We denote with Ai, j for 1 ≤ i ≤ �m/x� and 1 ≤ j ≤ �m′/y� the areas into which the image is parti-tioned. In parallel for 1 ≤ i ≤ �m/x�, if i is odd, a processor merges areas A2i−1, j and A2i, j provided theyare monochromatic and have the same color. The same is done horizontally for Ai,2 j−1 and Ai,2 j . At the k-thstep, if areas A(i−1)2k−1+1, j , A(i−1)2k−1+2, j , . . . Ai2k−1, j , with i odd, were merged, then they will merge with areasAi2k−1+1, j , Ai2k−1+2, j , . . . A(i+1)2k−1, j , if they are monochromatic with the same color. The same is done horizon-tally for Ai,( j−1)2k−1+1, Ai,( j−1)2k−1+2, . . . Ai, j2k−1 , with j odd, and Ai, j2k−1+1, Ai, j2k−1+2, . . . Ai,( j+1)2k−1 . AfterO(log M) steps, the procedure is completed and each step takes O(α) time and O(n/α) processors since thereis one processor for each area. Therefore, the image parsing phase is realized in O(α log M) time with O(n/α)

processors on an exclusive read, exclusive write shared memory machine. As in the previous section, the flag field110 corresponds the raw data so that we can indicate with 111 the end of the sequence of pointers correspondingto a given area. Since some areas could be entirely covered by a monochromatic rectangle, 111 is followed by theindex associated with the next area by the raster scan. The interested reader can see how the coding and decodingphase are designed in [16], since similar procedures are subroutines of the parallel block matching compression anddecompression algorithms. Under some realistic assumptions, implementations with the same parallel complexitycan be realized on a full binary tree architecture [16]. Also, the variable length coding technique presented in thispaper is described in [21] for the sequential implementation of the rectangular block matching method but it hasnot been employed by the theoretical parallel algorithms in [16,21] since it encodes bounded size two-dimensionalpatterns (moreover, the procedure in [21] produces a coding for which there is no parallel decompressor). However,the bound to the pattern dimensions is large enough with respect to realistic image data. Therefore, we use it in thenext section.

6 Scalability and Compression Effectiveness

Since the images of the CCITT test set are 1728 × 2376 pixels, they can be partitioned into 4k sub-images for1 ≤ k ≤ 8. For k = 8, the sub-image is 7 × 9 pixels. The extremal case is when we partition the image into 4 × 4subarrays. As we mentioned in section 4, the compression ratio is 0.13 in average for 1 ≤ k ≤ 4 and 0.14 for k = 5.For greater values of k, the compression effectiveness deteriorates. For k = 6, k = 7 and k = 8 the compressionratio is 0.17, 0.23 and 0.51 respectively. For the extremal case, the compression ratio is about 0.80. For k ≥ 6and the extremal case, we obviously have an improvement on the compression results if we use interprocessorcommunication to compute larger monochromatic rectangles with the procedure explained in the previous section.Figure 7 shows that the improvements are consistent but not satisfactory except for the extremal case. There is anexplanation for this. We compute larger monochromatic matches by merging adjacent monochromatic elements ofthe image partition. This process implies that such matches are likely to be sub-arrays of larger monochromaticrectangles with the margins included in non-monochromatic elements of the partition. Yet, these rectangles are

Page 9: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Binary Image Compression via Monochromatic Pattern Substitution 163

Fig. 7 Compression of the CCITT test set (left) and the topographic images (right)

likely to be computed as matches by the sequential procedure. Therefore, the smaller the elements are the betterthe merging procedure works. A more sophisticated procedure would be to consider the monochromatic matchesproperly contained in some element but such approach is not very practical and does not provide a parallel decoder[21]. As pointed out earlier, the compression effectiveness depends on the sub-image size rather than on the wholeimage. In fact, if we apply the parallel procedures with and without interprocessor communication to the test set oflarger binary images of Fig. 2 we obtain the average results shown in the right table of Fig. 7. As we can see, thecompression effectiveness has still a bell-shaped behaviour which maintains the sequential performance to a higherdegree of parallelism with respect to the CCITT test set and is again competitive with the sequential procedure whenthe highest degree of parallelism is reached in the extremal case. This confirms that actually the approach withoutinterprocessor communication works in the context of unbounded parallelism as long as the elements of the imagepartition are large enough to capture the monochromatic rectangles encoded with four bits for each dimension. Onthe other hand, interprocessor communication is effective only in the extremal case.

7 The Sequential Speed-Up

We experimented that if we partition an image into 4k sub-images and compress it via monochromatic patternsubstitution applied independently and sequentially to each sub-image, the waste factor decreases with the increasingof k. As mentioned in the introduction, the waste factor is less than 2 on realistic image data for k = 0 and decreasesto about 1 when k = 4. It follows that if we refine the partition by splitting the blocks horizontally and vertically,after four refinements no further relevant speed-up is obtained (we consider a speed-up relevant if it has the orderof magnitude of 1 cs). The experimental results of Fig. 8 show the speed-up obtained if the image is partitioned intoup to 256 blocks and sequentially each block is compressed independently. Obviously, there is a similar speed-upfor the decompressor as shown in the right table of Fig. 8. Decompression is about twice faster than compression.

Fig. 8 Sequential compression (left) and decompression (right) times on the CCITT images (ms.)

Page 10: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

164 L. Cinque et al.

Fig. 9 Parallel compression (left) and decompression (right) times on the CCITT images (ms.)

Fig. 10 Sequential compression (left) and decompression (right) times on the topographic images (ms.)

Fig. 11 Parallel compression (left) and decompression (right) times on the topographic images (ms.)

Compression and decompression running times were obtained using a single core of a quad core with a CPU IntelCore 2 Quad Q9300-2.5GHz with 3.25 GB of RAM.

The sequential speed-up can also be applied to a parallel implementation on a small scale system since theexperimental results show that the speed-up happens for an image partitioned into 4k sub-images for every positiveinteger k ≤ 4. The parallel compression and decompression running times on the CCITT image test set are shownin Fig. 9 using the four cores of the quadcore machine. Obviously, a similar experiment could be run using tworefinements or one refinement of the partition on a system with 16 or 64 cores respectively. Experimental resultswith 16 processors on the set of larger topographic images are shown in the next section.

8 Speeding-Up Parallel Computation

When we consider the test set of 4096 × 4096 pixels images of Fig. 2 and these images are partitioned into 4k

sub-images, no further speed-up is obtained for k = 5 as for the CCITT test. Therefore, the waste factor seems to bedetermined by the number of refinements independently from the image size on realistic data. We show in Fig. 10the compression and decompression running times with one processor of the avogadro.cilea.it machine on the five

Page 11: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

Binary Image Compression via Monochromatic Pattern Substitution 165

images before and after the partition into 256 blocks. The compression and decompression running times with 16processors before and after the partition into 256 blocks are shown in Fig. 11. This means that each processorworks on a sub-image partitioned into 16 blocks when the number of blocks is 256. Running times are given asmilliseconds, which are the time units used for the quadcore experiments, but the centisecond is the actual time unitemployed with the avogadro machine and the running time is provided as an integer number.

9 Conclusions

We provided a parallel low-complexity lossless compression technique for binary images which is suitable forboth small and large distributed systems at no communication cost. The degree of locality of the technique ishigh enough to guarantee no loss of compression effectiveness when it is applied independently to blocks withdimensions about 50 × 50 pixels, as argued on the basis of the experimental results shown on two sets of images ofdifferent size. When the blocks are smaller, interprocessor communication is needed but it is not effective except forthe extremal case of blocks with dimensions 4×4 pixels. We experimented with at most 32 processors obtaining theexpected linear speed-up. We also showed that the most efficient way to apply monochromatic pattern substitution tobinary image compression with a sequential algorithm is to compress independently the 256 blocks of a partitionedinput image. Since a speed-up is obtained as well with partitions of lower cardinality, the performance on a smallscale distributed system can be improved as shown by our experimental results with four and sixteen processors.As future work, we wish to experiment with more processors by implementing the procedure on a graphicalprocessing unit [22], as already done in [23] with the compression method for grey scale and color images presentedin [11].

References

1. Cinque, L., De Agostino, S., Lombardi, L.: Binary image compression via monochromatic pattern substitution: effectiveness andscalability. In: Proceedings Prague Stringology Conference, pp. 103–115 (2010)

2. Cinque, L., De Agostino, S., Lombardi, L.: Binary image compression via monochromatic pattern substitution: a sequential speed-up. In: Proceedings Prague Stringology Conference, pp. 220–225 (2011)

3. Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, Englewood Cliffs (1990)4. Rissanen, J., Langdon, G.G.: Universal modeling and coding. IEEE Trans. Inf. Theory 27, 12–23 (1981)5. Rissanen, J.: Generalized Kraft inequality and arithmetic coding. IBM J. Res. Dev. 20, 198–203 (1976)6. Howard, P.G., Kossentini, F., Martinis, B., Forchammer, S., Rucklidge, W.J., Ono, F.: The Emerging JBIG2 Standard. IEEE Trans.

Circuits Syst. Video Technol. 8, 838–848 (1998)7. Wu, X., Memon, N.D.: Context-based, adaptive, lossless image coding. IEEE Trans. Commun. 45, 437–444 (1997)8. Weimberger, M.J., Seroussi G., Sapiro, G.: LOCO-I: A low complexity, context based, lossless image compression algorithm. In:

Proceedings IEEE Data Compression Conference, pp. 140–149 (1996)9. Golomb, S.W.: Run-length encodings. IEEE Trans. Inf. Theory 12, 399–401 (1966)

10. Rice, R.F.: Some Practical Universal Noiseless Coding Technique-part I Technical Report JPL-79-22. Jet Propulsion Laboratory,Pasadena, California, USA (1979)

11. Cinque, L., De Agostino, S., Liberati, F., Westgeest, B.: A simple lossless compression heuristic for grey scale images. Int. J. Found.Comput. Sci. 16, 1111–1119 (2005)

12. Storer, J.A.: Lossless image compression using generalized LZ1-type methods. In: Proceedings IEEE Data Compression Confer-ence, pp. 290–299 (1996)

13. Storer, J.A., Helfgott, H.: Lossless image compression by block matching. The Comput. J. 40, 137–145 (1997)14. Lempel, A., Ziv, J.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337–343 (1977)15. Storer, J.A., Szimansky, T.G.: Data compression via textual substitution. J. ACM 24, 928–951 (1982)16. Cinque, L., De Agostino, S., Lombardi, L.: Scalability and communication in parallel low-complexity lossless compression. Math.

Comput. Sci. 3, 391–406 (2010)17. Waterworth, J.R.: Data compression system. US Patent 4,701,745 (1987)18. Brent, R.P.: A linear algorithm for data compression. Aust. Comput. J. 19, 64–68 (1987)19. Whiting, D.A., George, G.A., Ivey, G.E.: Data compression apparatus and method. US Patent 5,016,009 (1991)20. Gailly, J., Adler, M.: http://www.gzip.org (1991)

Page 12: Binary Image Compression via Monochromatic Pattern Substitution: Sequential and Parallel Implementations

166 L. Cinque et al.

21. Cinque, L., De Agostino, S., Liberati, F.: A work-optimal parallel implementation of lossless image compression by block match-ing. Nordic J. Comput. 10, 11–20 (2003)

22. Bianchi, L., Gatti, R., Lombardi, L.: The future of parallel computing: gpu vs cell-general purpose planning against fast graphicalcomputation architecture, which is the best solution for general purpose computation. In: Proceedings GRAPP, pp. 419–425 (2008)

23. Bianchi, L., Gatti, R., Lombardi, L., Cinque, L.: Relative distance method for lossless image compression on parallel architectures—a new approach for lossless image compression on GPU. In: Proceedings VISAPP, pp. 20–25 (2009)