9
1. Problem Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing algorithms; A remotely located researcher may want to access both original and corrected versions of a document; We want to avoid sending the document twice, since both versions are very similar. Recto Verso

1. Problem Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing

Embed Size (px)

Citation preview

1. Problem

• Many archived two-sided manuscript documents suffer from bleed-through;

• Bleed-through can be effectively removed offline using image-processing algorithms;

• A remotely located researcher may want to access both original and corrected versions of a document;

• We want to avoid sending the document twice, since both versions are very similar.

Recto Verso

3. Algorithm Details

• We assume that the continuous recto and verso image coordinate frames are related by a six-parameter affine transformation

• We search for a parameter vector that gives the best match between the recto and the transformed flipped verso, in the least-squares sense

• We identify the registered verso image

),(),)(( 232221131211 pypxppypxpfyxfp A

2† ]),)[(],[(minargˆ nmfnmfm n

vtr At

t

],)[(],[ †ˆ nmfAnmf vv t

Registration

4. Joint Compression

• Based on existing standards• Original, uncorrected image compressed with

standard efficient compression scheme such as JPEG or JPEG 2000

• Segmentation map compressed using efficient bilevel compression scheme, such as JBIG or JBIG2

• Additional information for inpainting transmitted as side information

+ +

4.6Mbit 131 kbit

2. Bleed-through Removal

•We assume the existence of underlying recto and verso images without bleed-though. These consist of the background, with the writing, superimposed.

•These ideal recto and verso images are combined in some way to produce the observed recto and verso images corrupted with bleed-through (see above).

•In general, the scanned recto and verso images (with bleed-through) will not be aligned.

),( yxcbf),( yxcwf

Recto and flipped verso images superimposed

Model

Segmentation• We segment each side of the document into the four

regions R1-R4. However, it is most important to correctly identify region R2, ‘bleed-through only’. If we miss some parts of R2, bleed-through will remain. If the label R2 is incorrectly assigned to some parts of R1, ‘foreground only’ or R4 ‘foreground and bleed-through’, then parts of the desired writing will be erased.

1.We first identify points that can be considered to definitely be background (R3), because they are lighter than a certain threshold.

2.We then identify points that can be considered to foreground (R1), because they are darker than corresponding points on the other side.

3.Of the remaining points, those whose correlation between the two sides exceeds a correlation threshold are deemed to be bleedthrough (R2). The rest are assigned to R4.

Original with bleed-through With bleed-through removal

Algorithm

•Registration: Alignment of recto and flipped verso

•Segmentation: Four regions1.R1: Foreground only2.R2: Bleed-through only3.R3: Background4.R4: Foreground and

bleedthroughoverlap

•Inpainting: Region R2 filled in with estimate of background

Recto and flipped verso images, superimposed after registration

Illustration of four types of regions

Inpainting applied to circled region

Inpainting• Points labelled R2 ‘bleed-through’ are replaced by

suitable nearby points from the background region R3. In the initial work, a fixed value was used.

5. Conclusion

• Bleed-through can be effectively removed by jointly processing recto and verso sides of document.

• More complex bleed-through removal algorithms can be used at the server side, with the result transmitted to the remote user.

• It is not necessary to separately transmit original and corrected versions to a user who wishes to see both.

• All elements can be incorporated into JPEG2000. • More work needs to be done on the segmentation

and inpainting aspects of the algorithm.