< Terug naar vorige pagina

Publicatie

Probabilistic Homogeneity for Document Image Segmentation

Tijdschriftbijdrage - Tijdschriftartikel

In this paper we propose a novel probabilistic framework for document segmentation exploiting human perceptual recognition of text regions from complicated layouts. In particular, we conceptualize text homogeneity as the Gestalt pattern displayed in text regions, characterized by proximately and symmetrically arranged units with similar morphological and texture features. We model this pattern in the local region of a connected component (CC) using an hierarchical formulation, which simulates a random walk-and-check on a graph encoding the neighborhood of the CC. The proposed formulation allows an effective computation of what we call the probabilistic local text homogeneity (PLTH) using a weighted summation of the weights of the graph, which are derived from a probabilistic description of the homogeneity between neighboring CCs and computed through Bayesian cue integration. The proposed PLTH enables a multi-aspect analysis, where various primitives such as geometrical configuration, morphological features, texture characterization and location priors are integrated in one computational probabilistic model. This enables an effective text and non-text classification of CCs preceding any grouping process, which is currently absent in document segmentation. Experimental results show that our segmentation method based on the proposed PLTH model improves upon the state-of-the-art.
Tijdschrift: Pattern Recognit
ISSN: 0031-3203
Volume: 109
Pagina's: 1-14
Jaar van publicatie:2021
Trefwoorden:Probabilistic local text homogeneity, Random walk-and-check simulation, Bayesian cue integration, Text homogeneity pattern, Document image segmentation
BOF-keylabel:ja
IOF-keylabel:ja
BOF-publication weight:6
Auteurs:Regional
Authors from:Higher Education
Toegankelijkheid:Open