Abstract
This paper proposes a maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters. The maximum-minimum similarity training (MMS) methods optimize recognizer performance through maximizing the similarities of positive samples and minimizing the similarities of negative samples. Based on this approach to discriminative training, it defines the objective function for text extraction, and uses the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method. The experimental results of text extraction show the effectiveness of MMS training in text extraction. Compared with the maximum likelihood estimation of parameters from expectation maximization (EM) algorithm, the training results after MMS has the performance of text extraction improved greatly. The recall rate of 98.55% and the precision rate of 93.56% are achieved. The experimental results also show that the maximum-minimum similarity (MMS) training behaves better than the commonly used discriminative training of the minimum classification error (MCE).
Original language | English |
---|---|
Pages (from-to) | 621-629 |
Number of pages | 9 |
Journal | Ruan Jian Xue Bao/Journal of Software |
Volume | 19 |
Issue number | 3 |
DOIs | |
Publication status | Published - Mar 2008 |
Keywords
- Discriminative training
- Gaussian mixture modeling
- Maximum-minimum similarity training
- Minimum classification error training
- Text extraction