9
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1 Scanned Document Compression Using a Block-based Hybrid Video Codec Alexandre Zaghetto, Member, IEEE, and Ricardo L. de Queiroz, Senior Member, IEEE Abstract—This paper proposes a hybrid pattern matching/transform-based compression method for scanned documents. The idea is to use regular video interframe prediction as a pattern matching algorithm that can be applied to document coding. We show that this interpretation may generate residual data that can be efficiently compressed by a transform-based encoder. The efficiency of this approach is demonstrated using H.264/AVC as a high quality single- and multi-page document compressor. The proposed method, called Advanced Document Coding (ADC), uses segments of the originally independent scanned pages of a document to create a video sequence, which is then encoded through regular H.264/AVC. The encoding performance is unrivaled. Results show that ADC outperforms AVC-I (H.264/AVC operating in pure intra mode) and JPEG2000 by up to 2.7 dB and 6.2 dB, respectively. Superior subjective quality is also achieved. Index Terms—Scanned document compression, advanced doc- ument coding, pattern matching, H.264/AVC. I. I NTRODUCTION C OMPRESSION of scanned documents can be tricky. The scanned document is either compressed as a continuous- tone picture, or it is binarized before compression. The binary document can then be compressed using any available two- level lossless compression algorithm (such as JBIG [1] and JBIG2 [2]), or it may undergo character recognition [3]. Binarization may cause strong degradation to object contours and textures, such that, whenever possible, continuous-tone compression is preferred [4]. In single/multi-page document compression, each page may be individually encoded by some continuous-tone image compression algorithm, such as JPEG [5] or JPEG2000 [6], [7]. Multi-layer approaches such as the mixed raster content (MRC) imaging model [8]–[12] are also challenged by soft edges in scanned documents, often requiring pre- and post-processing [13]. Natural text along a document typically presents repetitive symbols such that dictionary-based compression methods be- come very efficient. For continuous-tone imagery, the recur- rence of similar patterns is illustrated in Fig. 1. Nevertheless, an efficient dictionary-based encoder relying on continuous- tone pattern matching is not that trivial. We propose an encoder that explores such a recurrence through the use of pattern- matching predictors and efficient transform encoding of the residual data. Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. The authors are with the Department of Computer Science, Universidade de Brasilia, Brazil, e-mail: [email protected], [email protected]. Fig. 1. Digitized books usually present recurrent patterns across different pages and across regions of the same page. It is important to place our proposal within the proper scenario. Three premises are assumed. Firstly, we want to avoid complex multi-coder schemes such as MRC. Secondly, the decoder should be as standard as possible. Since we are dealing with scanned compound documents (mixed pictures and text), natural image encoders, such as JPEG2000, are the most adequate. Non-standard encoders, based on fractals [14]– [16], texture prediction [17], [18], template matching [19] or multiscale pattern recurrence [20], [21], are good options out of the scope of what is being proposed. Thirdly, one should provide high quality reconstructed versions of scanned documents. This is especially important if rare books of historical value must be digitally stored, thus discarding optical character recognition (OCR) and token-based methods [2], [10]. In summary, we want a standard single coder approach that operates on natural images and delivers high-quality reconstructed compound documents. The proposed coder makes heavy use of the H.264/AVC standard video coder [22]. H.264/AVC has been well explained in the literature [23]–[27]. H.264/AVC leads to substantial performance improvement when compared to other existing standards [25], [28], such as MPEG-2 [29] and H.263 [30]. Among such improvements we can mention [22], [31]: in- terframe variable block size prediction; arbitrary reference frames; quarter-pel motion estimation; intraframe macroblock prediction; context-adaptive binary arithmetic coding; and in- loop deblocking filter. Results point to at least a factor of two improvement over previous standards. The many cod- ing advances brought into H.264/AVC not only set a new benchmark for video compression, but they also make it a This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Scanned document compression using block based hybrid video codec

Embed Size (px)

DESCRIPTION

Sybian Technologies Pvt Ltd Final Year Projects & Real Time live Projects JAVA(All Domains) DOTNET(All Domains) ANDROID EMBEDDED VLSI MATLAB Project Support Abstract, Diagrams, Review Details, Relevant Materials, Presentation, Supporting Documents, Software E-Books, Software Development Standards & Procedure E-Book, Theory Classes, Lab Working Programs, Project Design & Implementation 24/7 lab session Final Year Projects For BE,ME,B.Sc,M.Sc,B.Tech,BCA,MCA PROJECT DOMAIN: Cloud Computing Networking Network Security PARALLEL AND DISTRIBUTED SYSTEM Data Mining Mobile Computing Service Computing Software Engineering Image Processing Bio Medical / Medical Imaging Contact Details: Sybian Technologies Pvt Ltd, No,33/10 Meenakshi Sundaram Building, Sivaji Street, (Near T.nagar Bus Terminus) T.Nagar, Chennai-600 017 Ph:044 42070551 Mobile No:9790877889,9003254624,7708845605 Mail Id:[email protected],[email protected]

Citation preview

Page 1: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1

Scanned Document Compression Using a

Block-based Hybrid Video CodecAlexandre Zaghetto, Member, IEEE, and Ricardo L. de Queiroz, Senior Member, IEEE

Abstract—This paper proposes a hybrid patternmatching/transform-based compression method for scanneddocuments. The idea is to use regular video interframeprediction as a pattern matching algorithm that can be appliedto document coding. We show that this interpretation maygenerate residual data that can be efficiently compressed bya transform-based encoder. The efficiency of this approachis demonstrated using H.264/AVC as a high quality single-and multi-page document compressor. The proposed method,called Advanced Document Coding (ADC), uses segments ofthe originally independent scanned pages of a document tocreate a video sequence, which is then encoded through regularH.264/AVC. The encoding performance is unrivaled. Resultsshow that ADC outperforms AVC-I (H.264/AVC operating inpure intra mode) and JPEG2000 by up to 2.7 dB and 6.2 dB,respectively. Superior subjective quality is also achieved.

Index Terms—Scanned document compression, advanced doc-ument coding, pattern matching, H.264/AVC.

I. INTRODUCTION

COMPRESSION of scanned documents can be tricky. The

scanned document is either compressed as a continuous-

tone picture, or it is binarized before compression. The binary

document can then be compressed using any available two-

level lossless compression algorithm (such as JBIG [1] and

JBIG2 [2]), or it may undergo character recognition [3].

Binarization may cause strong degradation to object contours

and textures, such that, whenever possible, continuous-tone

compression is preferred [4]. In single/multi-page document

compression, each page may be individually encoded by

some continuous-tone image compression algorithm, such as

JPEG [5] or JPEG2000 [6], [7]. Multi-layer approaches such

as the mixed raster content (MRC) imaging model [8]–[12]

are also challenged by soft edges in scanned documents, often

requiring pre- and post-processing [13].

Natural text along a document typically presents repetitive

symbols such that dictionary-based compression methods be-

come very efficient. For continuous-tone imagery, the recur-

rence of similar patterns is illustrated in Fig. 1. Nevertheless,

an efficient dictionary-based encoder relying on continuous-

tone pattern matching is not that trivial. We propose an encoder

that explores such a recurrence through the use of pattern-

matching predictors and efficient transform encoding of the

residual data.

Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

The authors are with the Department of Computer Science,Universidade de Brasilia, Brazil, e-mail: [email protected],[email protected].

Fig. 1. Digitized books usually present recurrent patterns across differentpages and across regions of the same page.

It is important to place our proposal within the proper

scenario. Three premises are assumed. Firstly, we want to

avoid complex multi-coder schemes such as MRC. Secondly,

the decoder should be as standard as possible. Since we are

dealing with scanned compound documents (mixed pictures

and text), natural image encoders, such as JPEG2000, are the

most adequate. Non-standard encoders, based on fractals [14]–

[16], texture prediction [17], [18], template matching [19]

or multiscale pattern recurrence [20], [21], are good options

out of the scope of what is being proposed. Thirdly, one

should provide high quality reconstructed versions of scanned

documents. This is especially important if rare books of

historical value must be digitally stored, thus discarding optical

character recognition (OCR) and token-based methods [2],

[10]. In summary, we want a standard single coder approach

that operates on natural images and delivers high-quality

reconstructed compound documents.

The proposed coder makes heavy use of the H.264/AVC

standard video coder [22]. H.264/AVC has been well explained

in the literature [23]–[27]. H.264/AVC leads to substantial

performance improvement when compared to other existing

standards [25], [28], such as MPEG-2 [29] and H.263 [30].

Among such improvements we can mention [22], [31]: in-

terframe variable block size prediction; arbitrary reference

frames; quarter-pel motion estimation; intraframe macroblock

prediction; context-adaptive binary arithmetic coding; and in-

loop deblocking filter. Results point to at least a factor of

two improvement over previous standards. The many cod-

ing advances brought into H.264/AVC not only set a new

benchmark for video compression, but they also make it a

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 2: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 2

formidable compressor for still images [32]. The intraframe

macroblock prediction, combined with the context-adaptive

binary arithmetic coding (CABAC) [33] turns the H.264/AVC

into a powerful still image compression method (i.e. working

on a video sequence composed of one single frame). We refer

to this coder as AVC-I. Gains of the AVC-I over JPEG2000

are typically in the order of 0.25 dB to 0.5 dB in PSNR

for pictorial images [31], [32], [34]. For compound images

(mixture of text and picture) [4], the PSNR gains are more

substantial, even surpassing the mark of 3 dB improvement

over JPEG2000, in some cases [31].

The hypothesis presented is this paper is that a scanned

document encoder that employs state-of-the-art video coding

techniques and generates an H.264/AVC-decodable bit-stream

yields the best rate-distortion performance compared to other

continuous-tone still image compressors. 1.

II. THE PROPOSED METHOD AND ITS IMPLEMENTATION

USING AVC

The proposed document coder has a generic concept and an

implementation based on a stock H.264/AVC video coder. We

now describe the desired features and how one can implement

them using AVC. The generic description may help the reader

to adapt other video coders for that purpose or to develop

non-standard-based (proprietary) variations.

A. Block-based pattern matching

The encoder is based on pattern matching. The document

image is segmented into blocks of pixels. Each block is

matched to an existing pattern in a dictionary which is popu-

lated by the previous contents of the same document. In order

to do that, we partition the scanned document, which may be

made of one or more scanned pages of H×W pixels, into Np

(H/√

Np × W/√

Np pixels) sub-pages or frames. Hence, a

scanned book may be decomposed into many frames. Figure 2

illustrates the page pre-processing (partition) algorithm, while

Fig. 3 shows an example of a frame sequence built from a

3-pages set.

Blocks have for example 16×16 pixels and each one is

matched to an existing pattern in a previous frame. In this way,

the previous frames make a dynamic dictionary of patterns to

look when encoding the present frame, which is continuously

being updated as more frames are encoded. Once a match is

found, the matching pattern is used to predict the block and

the prediction error (residue) is encoded along with the frame

number and position where the match was found (reference

vector). A block can be partitioned into smaller blocks to ease

prediction at the cost of spending more bits to encode reference

vectors. Figure 4 illustrates the effect of using the pattern

matching prediction algorithm. Figures 4 (a) and (b) show

examples of a reference and a current text area, respectively.

Figures 4 (c), (e) and (g) represent the predictions of the

current text using 16×16, 8×8 and 4×4-pixel block partitions.

Figures 4 (d), (f) and (h) are the corresponding residual data.

1Preliminary results of the proposed method over multi-page text-onlydocuments have been presented at a conference [35].

0 1 2

3 4 5

6 7 8

9 10 11

Fig. 2. A document page is partitioned into segments (labeled in sequence).Each one is considered a frame and can be sequentially encoded.

Fig. 3. Example of a frame sequence, built from a 3 pages set, Np = 4

frames/page. Frames 1 to 4, 5 to 8, and 9 to 12 are built from pages 1, 2 and3, respectively.

Notice that the 4×4-pixel prediction generates a lower-energy

residual, when compared to the 16× 16 and 8× 8 prediction,

however, they require encoding more reference vectors.

In this context, video coders often use motion estimation

techniques which are essentially the same as pattern matching.

The H.264/AVC is capable of partitioning macroblocks of 16×

16 pixels into any valid combination of blocks of 16 × 8,

8× 16, 8× 8, 4× 8, 8× 4, and 4× 4 pixels. Our algorithm is

then to feed the document frames as video frames into AVC

since its motion estimation algorithm will take care of the

pattern matching search for us. However, motion estimation

algorithms always take advantage of the fact that video content

at the same frame position in neighbor frames are typically

very correlated. Since this is not our case, it is advisable to

make the search window to cover as much as possible of the

reference frames, or the whole frame, in order to enrich the

dictionary and to remove the spatial dependency.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 3: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 3

(a) (b)

(c) (d)

(e) (f)

(g)

2

(h)

Fig. 4. Illustration of approximate pattern matching using interframeprediction: (a) reference text; (b) current text; (c), (e) and (g) predicted text(block size: 16×16, 8×8 and 4×4 pixels, respectively); and (d), (f) and (h)prediction residue (block size: 16×16, 8×8 and 4×4 pixels, respectively).Each zoomed image patch has 178× 178 pixels.

B. Inter- and intra-frame prediction

AVC also allows for intra-frame prediction, in which a block

(partitioned or not) can be predicted from neighboring blocks

by means of directional extrapolation of the border pixels.

The decision to use or not intra-frame prediction is typically

based on rate-distortion optimization (RDO) and we use RDO

in all our simulations. However, AVC does not allow for

in-frame motion vectors (IFMV), but many variations using

such a feature and other sophisticated methods of intra-frame

prediction do exist [19]. Apparently, HEVC will also support

IFMV [36]. Breaking up the pages into frames allows for some

intra-document prediction similar to IFMV, yet using a stock

video coder. Furthermore, the information derived from IFMV

is typically very small compared to all compressed data, such

that the advantage should not be much relevant. Because of

that, we do not use IFMV.

Another issue is the random access to different book pages.

In order to get to a book page, we are forced to decompress

all the frames it uses as reference. So, if random access is an

issue, we suggest to periodically use no-reference frames, i.e.

frames in which inter-frame prediction is not allowed, relying

on pure intra-frame prediction/extrapolation.

In our encoder, using the AVC structure, each block-

partitioning combination and prediction mode is tested and the

best one is picked through RDO. With RDO within AVC, in

the k-th configuration test in a macroblock, AVC computes the

rate Rk (bits spent to encode the block) and distortion Dk (sum

of absolute differences - SAD) achieved by reconstructing the

block. One picks the block partition method that minimizes

Jk = Rk + λDk.

The process is then repeated for every macroblock. As usual,

λ controls compression ratios and is varied to find the RD

curves in our simulations.

C. Residual coding

The residual macroblock, i.e. the prediction error, is trans-

formed using 4 8×8-pixel discrete cosine transform (DCT) or

an integer approximation of it. The transformed blocks are

quantized and encoded using arithmetic coding. H.264/AVC

uses an integer transform with similar properties as the DCT

and the resulting transformed coefficients are quantized and

entropy encoded using CABAC.

D. Compound documents and region classification

Compound document compression usually segments the

image into regions and classifies each one as containing text

and graphics or images (or halftones, for instance). Once

a region is classified, it can be encoded using a proper

algorithm. This approach is driven by objects such as text

characters so that regions of the image are labeled based on

our estimate of its contents. Our method, however, is driven

by the compression itself. Rather than only testing pattern-

matching-based prediction for every block partition, we also

test prediction by extrapolating neighboring blocks, as in

”intra-prediction´´ in H.264/AVC. The RD-optimized selection

of the best prediction assures that the best option is picked.

Text and graphics shall contain recurrent patterns and will

be often encoded using patterns from previous regions, while

pictorial regions may resort to intra-prediction. In this sense,

segmentation is embedded into the encoding process. In fact,

the block prediction and RDO may have the same effect of

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 4: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 4

Fig. 5. Configuration parameters that have greater influence on the encoderperformance: Rf (number of reference frames) and Sr (search range).

a segmentation map, even though benefiting the compression

process, and not the true identification of image contents.

E. Encoder and decoder summary

In our concept, the frames are fed into AVC in sequence

just like in a regular video coder. Because of that relation to

AVC, we refer to the proposed method as advanced document

coding, or, simply, ADC. In a nutshell, ADC operation can

be summarized as (i) break the book pages into frames; (ii)

feed all frames to H.264/AVC resulting in an AVC-compatible

stream; (iii) decode the bit stream; and (iv) assemble the

decoded frames into the final document book pages.

In order to work well, H.246/AVC should operate in ”High”

profile, following an IPPP... framework. The encoder should

periodically use no-reference I-frames in the case random

access is desired. RDO should be turned on. Motion estimation

should be set to full search over a window that is as large

as possible. Note that other video coders such as HEVC

and MPEG-2 will also work, even though achieving different

performance levels due to their different sophistication levels.

III. EXPERIMENTAL RESULTS

In our tests, different page sets are compressed using

JPEG2000, AVC-I (H.264/AVC operating in pure intra mode)

and the proposed ADC. The reason we chose JPEG2000 and

AVC-I for comparison is that these are the most suitable

standards that would meet the three premises presented in

Section I.

Distortion metrics based on visual models such as Structural

Similarity (SSIM) [37] and Video Quality Metric (VQM) [38]

have been extensively tested for pictorial content. However,

they are unproven for text and graphics, which rely more on

resolution than on number of gray levels. Readability is very

important and some alternative metrics such as OCR efficiency

are considered. A good objective metric to reflect subjective

perception of text has not been well explored yet. Hence, we

opted to stick to the traditional PSNR as a distortion metric.

In JPEG2000 and AVC-I compression, the pages are sepa-

rately encoded. As for ADC, the first frame of the sequence

is encoded as an I-frame (only intraframe prediction modes

are used) and all the remaining frames are encoded as P-

frames (in addition to intraframe prediction, only past frames

0.2 0.3 0.4 0.5 0.6 0.7 0.8

20

25

30

35

40

45

Bitrate (bpp)

PS

NR

(dB

)

Sequence "guita" (Np = 4, S

r = 32 and R

f = 1, 3, 5)

ADC (Sr = 32, R

f = 5)

ADC (Sr = 32, R

f = 3)

ADC (Sr = 32, R

f = 1)

AVC−I

JPEG2000

(a)

0.2 0.3 0.4 0.5 0.6 0.7 0.8

20

25

30

35

40

45

Bitrate (bpp)

PS

NR

(dB

)

Sequence "guita" (Np = 4, S

r = 08, 16, 32 and R

f = 5)

ADC (Sr = 32, R

f = 5)

ADC (Sr = 16, R

f = 5)

ADC (Sr = 08, R

f = 5)

AVC−I

JPEG2000

(b)

Fig. 6. Comparison between JPEG2000, AVC-I and the proposed ADC, fordifferent combinations of search ranges (Sr) and number of reference frames(Rf ) for test sequence “guita”. The number of frames/page (Np) is 4.

are used as reference by the interframe prediction). We also

considered that each page may be segmented into Np = 4

frames, Np = 16 frames, or not segmented at all (Np = 1,

for multi-page documents only). Two configuration parameters

have greater influence on the encoder performance. One is

the number of reference frames, Rf , the other is the search

range, Sr, as illustrated in Fig. 5. Initially, we evaluated the

effect of choosing different values for Sr and Rf . Figure 6

shows PSNR plots comparing JPEG2000, AVC-I and ADC

(Np = 4 frames/page), for different combinations of Sr and

Rf . The PSNR was calculated using the global mean square

error (MSE). The higher the Sr and Rf values, the better the

rate-distortion performance. In particular, for Sr = 32 pixels

and Rf = 5 frames, ADC outperforms AVC-I by more than 2

dB and JPEG2000 by more than 5 dB, at 0.5 bit/pixel (bpp).

Our test set is composed by 18 documents divided into the

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 5: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 5

TABLE IAVERAGE OBJECTIVE (PSNR IN DB) IMPROVEMENT OVER EXISTING

STANDARDS FOR THE 4 DOCUMENT TEST SETS.

Document Set 0 1 2 3

JPEG2000 6.26 5.61 4.47 2.41

AVC-I 2.75 2.58 1.56 0.89

following 4 classes2:

• Class 0: 6 multi-page text-only documents;

• Class 1: 6 single-page text-only documents;

• Class 2: 3 multi-page compound documents; and

• Class 3: 3 single-page compound documents.

Figure 7 illustrates one example of each class. Examples

of PSNR plots are shown in Figs. 8. Figure 9 shows average

PSNR improvement of ADC over JPEG2000 and AVC-I for

each of the document class test sets. In all cases, JPEG2000

and AVC-I are objectively outperformed, considering Sr = 32,

Rf = 5 and Np = 16. Figure 10 (a) shows a zoomed

part of the original “cerrado” sequence. Its encoded and

reconstructed versions using AVC-I, JPEG2000 and ADC,

at approximately 0.25 bits/pixel, are shown in Figs. 10 (b),

(c) and (d), respectively. ADC also yields superior subjective

quality. As a reference, in Table I we present the gains

of the proposed method over AVC-I and JPEG2000 using

Bjontegaard’s method [39] applied to the curves in Fig. 9. As

one can see from the table and from the graphics, the gains

are very substantial.

Our software-based tests using the popular and efficient

x246 implementation of the H.264/AVC standard and the

Kakadu implementation of JPEG 2000, indicate that ADC

(AVC running with 5 reference frames, slowest search, 32×32

window, in IPPPP... mode) is near 10× slower than JPEG

2000. x264 in an Intel Core i7 platform has been shown to

encode RGB video at a rate of 3 to 30 million pixels per

second (Mps). The variation is due to the various x264 settings

that affect speed and quality. Of course, in order to do that,

the system was be dedicated to the task, as an appliance. A

scanned letter-sized (8.5× 11 in) page at 600 pixels per inch

(ppi) yields about 33 million pixels. Hence, we can expect a

page compression speed roughly in the order of 5 to 50 pages

per minute (ppm). This page rate may be acceptable for many

on-the-fly applications and is definitely reasonable for off-line

compression of books and such. A rigorous complexity study

of the encoding algorithms presented here is beyond the scope

of this paper.

IV. CONCLUSIONS

In this paper, we presented a pattern matching/transform-

based encoder for scanned documents named ADC. The reason

why we decided to use H.264/AVC tools to implement the

proposed method is because its interframe prediction scheme

allied with RDO yield an efficient pattern matching algorithm.

2The entire test set is available at http://image.unb.br/queiroz/testset

In addition, the intraframe prediction, the DCT-based trans-

form and the CABAC contribute to improve the encoding

efficiency.

In essence, our work can be summarized as splitting the

document into many pages, forming frames, and feeding the

frames to AVC. Despite the simplicity of the idea, the perfor-

mance for scanned documents is unrivaled, to our knowledge.

Results show that ADC objectively outperforms AVC-I and

JPEG2000 by up to 2.7 dB and 6.2 dB, respectively, with more

significant gains observed for multi-page text-only documents.

Furthermore, the encoder outputs documents with superior

subjective quality. Replacing H.264/AVC by HEVC in ADC

would yield even larger gains.

REFERENCES

[1] JBIG, “Information Technology - Coded Representation of Picture andAudio Information - Progressive Bi-level Image Compression. ITU-TRecommendation T.82,” Mar. 1993.

[2] JBIG2, “Information Technology - Coded Representation of Picture andAudio Information - Lossy/Lossless Coding of Bi-level Images. ITU-TRecommendation T.88,” Mar. 2000.

[3] S. Mori, C.Y. Suen, and K.; Yamamoto, “Historical review of OCRresearch and development,” Proceedings of the IEEE, vol. 80, no. 7, pp.1029–1058, Jul. 1992.

[4] R. L. de Queiroz, Compressing Compound Documents, in the Document

and Image Compression Handbook, by M. Barni, Marcel-Dekker, EUA,2005.

[5] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compres-

sion Standard, Chapman and Hall, 1993.

[6] JPEG, “Information Technology - JPEG2000 Image Coding System -Part 1: Core Coding System. ISO/IEC 15444-1,” 2000.

[7] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression

Fundamentals, Standards and Practice, Kluwer Academic, EUA, 2002.

[8] MRC, “Mixed Raster Content (MRC). ITU-T Recommendation T.44,”1999.

[9] R. L. de Queiroz, R. Buckley, and M. Xu, “Mixed Raster Content(MRC) model for compound image compression,” Proc. of SPIE Visual

Communications and Image Processing, vol. 3653, pp. 1106–1117, Jan.1999.

[10] P. Haffner, P. G. Howard, P. Simard, Y. Bengio, and Y. Lecun, “Highquality document image compression with DjVu,” Journal of Electronic

Imaging, vol. 7, pp. 410–425, 1998.

[11] G. Feng and C. A. Bouman, “High-quality MRC document coding,”IEEE Trans. on Image Processing, vol. 15, no. 10, pp. 3152–3169, Oct.2006.

[12] A. Zaghetto, R. L de Queiroz, and D. Mukherjee, “MRC compressionof compound documents using threshold segmentation, iterative data-filling and H.264/AVC-INTRA,” Proc. Indian Conference on Computer

Vision, Graphics and Image Processing, Dec. 2008.

[13] A. Zaghetto and R. L de Queiroz, “Improved layer processing for MRCcompression of scanned documents,” Proc. of IEEE Intl. Conference on

Image Processing, pp. 1993 – 1996, Nov. 2009.

[14] E. Walach and E. Karnin, “A fractal-based approach to image com-pression,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal

Processing, vol. 11, pp. 529 – 532, Apr. 1986.

[15] S.M. Kocsis, “Fractal-based image compression,” in Proc. 23rd.

Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 177–181, 1989.

[16] A. Wakatani, “Improvement of adaptive fractal image coding on GPUs,”in Proc. IEEE Intl. Conf. on Consumer Electronics, pp. 255 –256, Jan.2012.

[17] D.C. Garcia and R.L. de Queiroz, “Least-squares directional intraprediction in h.264/avc,” IEEE Signal Processing Letters, vol. 17, no.10, pp. 831–834, Oct. 2010.

[18] Y. Liu L. Liu and E. Delp, “Enhanced intra prediction using contex-tadaptive linear prediction,” in Proc. of PCS 2007 - Picture Coding

Symp., Nov. 2007.

[19] C. Lan, J. Xu, F. Wu, and G. Shi, “Intra frame coding with templatematching prediction and adaptive transform,” in Proc. IEEE Intl. Conf.

Image Processing,, pp. 1221 –1224, Sep., 2010.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 6: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 6

(a) (b) (c) (d)

Fig. 7. Examples of documents used in our experiments: (a) class 0: multi-page text-only documents; (b) class 1: single-page text-only documents; (c) class2: multi-page compound documents; and (d) class 3: single-page compound documents.

[20] E. B. Lima Filho, E. A. B. da Silva, M. B. de Carvalho, and F. S. Pinage,“Universal image compression using multiscale recurrent patterns withadaptive probability model,” IEEE Trans. on Image Processing, vol. 17,no. 4, pp. 512–527, Apr. 2008.

[21] N. Francisco, N. Rodrigues, E. da Silva, M. de Carvalho, S. de Faria, andV. da Silva, “Scanned compound document encoding using multiscalerecurrent patterns,” IEEE Trans. on Image Processing, vol. 19, no. 10,pp. 2712–2724, Apr. 2010.

[22] JVT, “Advanced Video Coding for Generic Audiovisual Services. ITU-TRecommendation H.264,” Nov. 2007.

[23] I. E. G. Richardson, H.264 and MPEG-4 video compression, Wiley,EUA, 2003.

[24] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overviewof the H.264/AVC video coding standard,” IEEE Trans. on Circuits and

Systems for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003.

[25] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan,“Rate-constrained coder control and comparison of video coding stan-dards,” IEEE Trans. on Circuits and Systems for Video Technology, vol.13, no. 7, pp. 688–703, July 2003.

[26] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,T. Stockhammer, and T. Wedi, “Video Coding with H.264/AVC: Tools,Performance, and Complexity,” IEEE Circuits and Systems Magazine,vol. 4, no. 1, pp. 7–28, Mar. 2004.

[27] G. J. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC AdvancedVideo Coding Standard: Overview and Introduction to the Fidelity RangeExtensions,” Proc. of SPIE Conference on Applications of Digital Image

Processing XXVII, Special Session on Advances in the New Emerging

Standard: H.264/AVC, vol. 5558, pp. 53–74, Aug. 2004.

[28] N. Kamaci and Y. Altunbasak, “Performance comparison of theemerging H.264 video coding standard with the existing standards,”Proc. Intl. Conf. on Multimedia and Expo, vol. 1, pp. 345–348, July2003.

[29] B. G. Haskell, A. Puri, and A. N. Netravalli, Digital Video: An

Introduction to MPEG-2, Chapman and Hall, EUA, 1997.

[30] ITU-T, “Video Coding for Low Bit Rate Communication. ITU-TRecommendation H.263,” Version 1: Nov. 1995, Version 2: Jan. 1998,Version 3: Nov. 2000.

[31] R. L. de Queiroz, R. S. Ortis, A. Zaghetto, and T. A. Fonseca, “Fringebenefits of the H.264/AVC,” Proc. of International Telecommunications

Symposium, pp. 208–212, Sep. 2006.

[32] D. Marpe, V. George, and T. Wiegand, “Performance comparison ofintra-only H.264/AVC and JPEG2000 for a set of monochrome ISO/IECtest images,” Contribution JVT ISO/IEC MPEG and ITU-T VCEG, Doc.

JVT M-014, Oct. 2004.

[33] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binaryarithmetic coding in the H.264/AVC video compression standard,” IEEE

Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp.620 – 636, July 2003.

[34] A. Al, B. P. Rao, S. S. Kudva, S. Babu, D. Sumam, and A. V.Rao, “Quality and complexity comparison of H.264 intra mode withJPEG2000 and JPEG,” Proc. of IEEE International Conference on Image

Processing, vol. 1, pp. 24–27, Oct. 2004.

[35] A. Zaghetto and R. L de Queiroz, “High-quality scanned book compres-sion using pattern matching,” Proc. of IEEE International Conference

on Image Processing, pp. 26 – 29, Sep. 2010.[36] K. Ugur et. al. , “High performance, low complexity video coding and

the emerging HEVC standard,” IEEE Trans. on Circuits and Systems

for Video Technology, vol. 20, no. 12, pp. 1688–1697, Dec. 2010.[37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image

quality assessment: From error visibility to structural similarity,” IEEE

Transactios on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.[38] M.H. Pinson, and S. Wolf, “A new standardized method for objectively

measuring video quality,” IEEE Transactions on Broadcasting, vol.50,no.3, pp. 312- 322, Sept. 2004.

[39] G. Bjontegaard, “Calculation of average PSNR differences betweenRD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX,Apr. 2001.

Alexandre Zaghetto received the Engineer degreein 2002, from the Federal University of Rio deJaneiro, Rio de Janeiro, Brazil, the M.Sc. degreein 2004, from the University of Brasilia, Brasilia,Brazil, and and the Ph.D. degree in 2009 also fromthe University of Brasilia, all in Electrical Engi-neering. In 2009, he became Associate Professor atthe Computer Science Department at University ofBrasilia. His main research interests are in image andvideo processing, compound document coding, bio-metrics, fuzzy logic and artificial neural networks.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 7: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 7

0.2 0.3 0.4 0.5 0.6 0.7 0.8

25

30

35

40

45

Bitrate(bpp)

PS

NR

(dB

)Comparison between coders: "guita"

ADC − 16 frames/page

ADC − 4 frames/page

ADC − 1 frame/page

AVC−I

JPEG2000

(a)

0.2 0.4 0.6 0.8

25

30

35

40

Bitrate(bpp)

PS

NR

(dB

)

Comparison between coders: page 2 of "guita"

ADC − 16 frames/page

ADC − 4 frames/page

AVC−I

JPEG2000

(b)

0.3 0.4 0.5 0.6 0.7 0.8

30

35

40

Bitrate(bpp)

PS

NR

(dB

)

Comparison between coders: "paper"

ADC − 16 frames/page

ADC − 4 frames/page

AVC−I

JPEG2000

(c)

0.4 0.6 0.8 1

28

30

32

34

36

38

40

42

Bitrate(bpp)

PS

NR

(dB

)

Comparison between coders: "carta"

ADC − 16 frames/page

ADC − 4 frames/page

AVC−I

JPEG2000

(d)

Fig. 8. Examples of PSNR plots for: (a) class 0 (multi-page, text-only), document “guita” (number of pages: 2, size: 1568 × 1024 pixels); (b) class 1(single-page, text-only), page 2 of document “guita”(1568× 1024 pixels); (c) class 2 (multi-page, compound), document “paper” (number of pages: 4, size:2304 × 1632 pixels); and (d) class 3 (single-page, compound), document “carta” (2152 × 1632 pixels). Search range and number of reference frames areSr = 32 and Rf = 5, respectively.

Ricardo L. de Queiroz received the Engineer de-gree from Universidade de Brasilia , Brazil, in 1987,the M.Sc. degree from Universidade Estadual deCampinas, Brazil, in 1990, and the Ph.D. degreefrom The University of Texas at Arlington , in 1994,all in Electrical Engineering.

In 1990-1991, he was with the DSP researchgroup at Universidade de Brasilia, as a researchassociate. He joined Xerox Corp. in 1994, wherehe was a member of the research staff until 2002.In 2000-2001 he was also an Adjunct Faculty at

the Rochester Institute of Technology. He joined the Electrical EngineeringDepartment at Universidade de Brasilia in 2003. In 2010, he became a FullProfessor at the Computer Science Department at Universidade de Brasilia.Dr. de Queiroz has published over 150 articles in Journals and conferencesand contributed chapters to books as well. He also holds 46 issued patents.He is an elected member of the IEEE Signal Processing Society’s MultimediaSignal Processing (MMSP) Technical Committee and a former member of theImage, Video and Multidimensional Signal Processing (IVMSP) TechnicalCommittee. He is a past editor for the EURASIP Journal on Image andVideo Processing, IEEE Signal Processing Letters, IEEE Transactions on

Image Processing, and IEEE Transactions on Circuits and Systems for VideoTechnology. He has been appointed an IEEE Signal Processing SocietyDistinguished Lecturer for the 2011-2012 term.

Dr. de Queiroz has been actively involved with the Rochester chapter ofthe IEEE Signal Processing Society, where he served as Chair and organizedthe Western New York Image Processing Workshop since its inception until2001. He is now helping organizing IEEE SPS Chapters in Brazil andjust founded the Brasilia IEEE SPS Chapter. He was the General Chairof ISCAS’2011, and MMSP’2009, and is the General Chair of SBrT’2012.He was also part of the organizing committee of ICIP’2002. His researchinterests include image and video compression, multirate signal processing,and color imaging. Dr. de Queiroz is a Senior Member of IEEE, a memberof the Brazilian Telecommunications Society and of the Brazilian Society ofTelevision Engineers.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 8: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 8

0.2 0.4 0.6 0.8

26

28

30

32

34

36

38

40

42

Bitrate(bpp)

Ave

rag

e P

SN

R(d

B)

Average over class 0

ADC − 16 frames/page

AVC−I

JPEG2000

(a)

0.4 0.6 0.8 1

26

28

30

32

34

36

38

40

42

Bitrate(bpp)

Ave

rag

e P

SN

R(d

B)

Average over class 1

ADC − 16 frames/page

AVC−I

JPEG2000

(b)

0.4 0.6 0.8 1 1.2

26

28

30

32

34

36

38

40

Bitrate(bpp)

Ave

rag

e P

SN

R(d

B)

Average over class 2

ADC − 16 frames/page

AVC−I

JPEG2000

(c)

0.4 0.6 0.8 1 1.2 1.4 1.6

28

30

32

34

36

38

40

Bitrate(bpp)

Ave

rag

e P

SN

R(d

B)

Average over class 3

ADC − 16 frames/page

AVC−I

JPEG2000

(c)

Fig. 9. Comparison of ADC against JPEG2000 and AVC-I in terms of PSNR averaged for documents in: (a) class 0; (b) class 1; (c) class 2; and (d) class 3.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 9: Scanned document compression using block based hybrid video codec

SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 9

(a) (b)

(c) (d)

Fig. 10. Subjective comparison among coders: (a) zoomed part of “cerrado” sequence; reconstructed versions using (b) AVC-I, (c) JPEG2000 and (d) ADC,at approximately 0.25 bits/pixels. ADC yields superior subjective quality.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].