Text extraction based on maximum-minimum similarity training method

Hui Fu; Xia Bi Liu; Yun De Jia

doi:10.3724/SP.J.1001.2008.00621

Text extraction based on maximum-minimum similarity training method

Hui Fu^*, Xia Bi Liu, Yun De Jia

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

This paper proposes a maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters. The maximum-minimum similarity training (MMS) methods optimize recognizer performance through maximizing the similarities of positive samples and minimizing the similarities of negative samples. Based on this approach to discriminative training, it defines the objective function for text extraction, and uses the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method. The experimental results of text extraction show the effectiveness of MMS training in text extraction. Compared with the maximum likelihood estimation of parameters from expectation maximization (EM) algorithm, the training results after MMS has the performance of text extraction improved greatly. The recall rate of 98.55% and the precision rate of 93.56% are achieved. The experimental results also show that the maximum-minimum similarity (MMS) training behaves better than the commonly used discriminative training of the minimum classification error (MCE).

源语言	英语
页（从-至）	621-629
页数	9
期刊	Ruan Jian Xue Bao/Journal of Software
卷	19
期	3
DOI	https://doi.org/10.3724/SP.J.1001.2008.00621
出版状态	已出版 - 3月 2008

访问文件

10.3724/SP.J.1001.2008.00621

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9421b00565d74339ab38f7431e309d2e,

title = "Text extraction based on maximum-minimum similarity training method",

abstract = "This paper proposes a maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters. The maximum-minimum similarity training (MMS) methods optimize recognizer performance through maximizing the similarities of positive samples and minimizing the similarities of negative samples. Based on this approach to discriminative training, it defines the objective function for text extraction, and uses the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method. The experimental results of text extraction show the effectiveness of MMS training in text extraction. Compared with the maximum likelihood estimation of parameters from expectation maximization (EM) algorithm, the training results after MMS has the performance of text extraction improved greatly. The recall rate of 98.55% and the precision rate of 93.56% are achieved. The experimental results also show that the maximum-minimum similarity (MMS) training behaves better than the commonly used discriminative training of the minimum classification error (MCE).",

keywords = "Discriminative training, Gaussian mixture modeling, Maximum-minimum similarity training, Minimum classification error training, Text extraction",

author = "Hui Fu and Liu, {Xia Bi} and Jia, {Yun De}",

year = "2008",

month = mar,

doi = "10.3724/SP.J.1001.2008.00621",

language = "English",

volume = "19",

pages = "621--629",

journal = "Ruan Jian Xue Bao/Journal of Software",

issn = "1000-9825",

publisher = "Chinese Academy of Sciences",

number = "3",

}

TY - JOUR

T1 - Text extraction based on maximum-minimum similarity training method

AU - Fu, Hui

AU - Liu, Xia Bi

AU - Jia, Yun De

PY - 2008/3

Y1 - 2008/3

N2 - This paper proposes a maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters. The maximum-minimum similarity training (MMS) methods optimize recognizer performance through maximizing the similarities of positive samples and minimizing the similarities of negative samples. Based on this approach to discriminative training, it defines the objective function for text extraction, and uses the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method. The experimental results of text extraction show the effectiveness of MMS training in text extraction. Compared with the maximum likelihood estimation of parameters from expectation maximization (EM) algorithm, the training results after MMS has the performance of text extraction improved greatly. The recall rate of 98.55% and the precision rate of 93.56% are achieved. The experimental results also show that the maximum-minimum similarity (MMS) training behaves better than the commonly used discriminative training of the minimum classification error (MCE).

AB - This paper proposes a maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters. The maximum-minimum similarity training (MMS) methods optimize recognizer performance through maximizing the similarities of positive samples and minimizing the similarities of negative samples. Based on this approach to discriminative training, it defines the objective function for text extraction, and uses the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method. The experimental results of text extraction show the effectiveness of MMS training in text extraction. Compared with the maximum likelihood estimation of parameters from expectation maximization (EM) algorithm, the training results after MMS has the performance of text extraction improved greatly. The recall rate of 98.55% and the precision rate of 93.56% are achieved. The experimental results also show that the maximum-minimum similarity (MMS) training behaves better than the commonly used discriminative training of the minimum classification error (MCE).

KW - Discriminative training

KW - Gaussian mixture modeling

KW - Maximum-minimum similarity training

KW - Minimum classification error training

KW - Text extraction

UR - http://www.scopus.com/inward/record.url?scp=41949115231&partnerID=8YFLogxK

U2 - 10.3724/SP.J.1001.2008.00621

DO - 10.3724/SP.J.1001.2008.00621

M3 - Article

AN - SCOPUS:41949115231

SN - 1000-9825

VL - 19

SP - 621

EP - 629

JO - Ruan Jian Xue Bao/Journal of Software

JF - Ruan Jian Xue Bao/Journal of Software

IS - 3

ER -

Text extraction based on maximum-minimum similarity training method

摘要

访问文件

其它文件与链接

指纹

引用此