TY - JOUR
T1 - Sentence similarity computational model based on information content
AU - Wu, Hao
AU - Huang, Heyan
N1 - Publisher Copyright:
Copyright ©2016 The Institute of Electronics, Information and Communication Engineers.
PY - 2016/6
Y1 - 2016/6
N2 - Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-The-Art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.
AB - Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-The-Art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.
KW - Inclusionexclusion principle
KW - Information content
KW - Information retrieval
KW - Natural language processing
KW - Sentence semantic similarity
UR - http://www.scopus.com/inward/record.url?scp=85009103537&partnerID=8YFLogxK
U2 - 10.1587/transinf.2015EDP7474
DO - 10.1587/transinf.2015EDP7474
M3 - Article
AN - SCOPUS:85009103537
SN - 0916-8532
VL - E99D
SP - 1645
EP - 1652
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 6
ER -