< Back to previous page

Publication

Bayesian Damage Recognition in Document Images Based on a Joint Global and Local Homogeneity Model

Journal Contribution - Journal Article

Physical damages (such as torn-offs and scratches) are commonly seen in historical documents. Recognition of such damages is currently absent in digitization-and-information-extraction (DIE) systems but crucial for automatic document comprehension and exploitation. In this paper we propose a generic damage recognition (DR) method based on a joint global and local modeling of the text homogeneity (TH) pattern exhibited in document images. More specifically, a connected component (CC) based formulation is developed as a global homogeneity measure, where TH is characterized using a probabilistic graph model for a coarse recognition of damaged regions. A multi-resolution analysis (MRA) of TH is further developed for a granular within-CC recognition of damage pixels, where the disparity between damage and text pixels is characterized by exploiting neighborhood transitions. This enables the formulation of a local homogeneity measure, where the neighborhood transition around an individual pixel is modeled using the propagation of the approximation coefficients of a stationary wavelet transform (SWT). The proposed global and local homogeneity measures are integrated as a joint likelihood in a Bayesian model with a Markov random field (MRF) prior, where DR is formulated as a maximum a posterior (MAP) inference which is addressed using Markov Chain Monte Carlo (MCMC) sampling. The resulting algorithm is tested on a set of real-life historical newspaper images containing damages of varying size and shape. The performance of the algorithm is evaluated using both F-measures and the Intersection-over-Union (IoU) metric, where test results demonstrate the promising potential of the proposed method.

Journal: Pattern Recognit
ISSN: 0031-3203
Volume: 118
Publication year:2021
BOF-keylabel:yes
IOF-keylabel:yes
BOF-publication weight:6
Authors:Regional
Authors from:Government, Higher Education
Accessibility:Open