Automatic Training of Page Segmentation Algorithms: An Optimization Approach

Mao S, Kanungo T
International Conference on Pattern Recognition. 2000 Sept.;:531-534.


Most page segmentation algorithms have userspecifiable free parameters. However, algorithm designers typically do not provide a quantitative/rigorous method for choosing values for these parameters. The free parameter values can affect the segmentation result quite drastically and are very dependent on the particular dataset that the algorithm is being used on. In this paper, we present an automatic training method for choosing free parameters of page segmentation algorithms. The automatic training problem is posed as a multivariate non-smooth function optimization problem. An efficient direct search method – simplex method – is used to solve this optimization problem. This training method is then applied to the training of Kise’s page segmentation algorithm. It is found that a set of optimal parameter values and their corresponding performance index can be found using relatively few function evaluations. The UW III dataset was used for conducting our experiments.