Abstract

Yasser El Sonbaty
Document Image Matching Using a Maximal Grid Approach
a new approach for form document representation using the maximal grid of its frameset is presented. using image processing techniques, a scanned form is transformed into a frameset composed of a number of cells. the maximal grid is the grid that encompasses all the horizontalvertical lines in the formcan be easily generated from the cell coordinates. the number of cells from the original frameset, included in each of the cells created by the maximal grid, is then calculated. those numbers are added for each rowcolumn generating an array representation for the frameset. a novel algorithm for similarity matching of document framesets based on their maximal grid representations is introduced. the algorithm is robust to image noiseto line breaks, which makes it applicable to poor quality scanned documents. the matching algorithm renders the similarity between two forms as a value between 01. thus, it may be used to rank the forms in a database according to their similarity to a query form. several experiments were performed in order to demonstrate the accuracythe efficiency of the proposed approach.