Sentence similarity computational model based on information content

Hao Wu; Heyan Huang

doi:10.1587/transinf.2015EDP7474

Sentence similarity computational model based on information content

Hao Wu, Heyan Huang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-The-Art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.

Original language	English
Pages (from-to)	1645-1652
Number of pages	8
Journal	IEICE Transactions on Information and Systems
Volume	E99D
Issue number	6
DOIs	https://doi.org/10.1587/transinf.2015EDP7474
Publication status	Published - Jun 2016

Keywords

Inclusionexclusion principle
Information content
Information retrieval
Natural language processing
Sentence semantic similarity

Access to Document

10.1587/transinf.2015EDP7474

Cite this

@article{ec78537abd36417583ce779ba2190b12,

title = "Sentence similarity computational model based on information content",

abstract = "Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-The-Art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.",

keywords = "Inclusionexclusion principle, Information content, Information retrieval, Natural language processing, Sentence semantic similarity",

author = "Hao Wu and Heyan Huang",

note = "Publisher Copyright: Copyright {\textcopyright}2016 The Institute of Electronics, Information and Communication Engineers.",

year = "2016",

month = jun,

doi = "10.1587/transinf.2015EDP7474",

language = "English",

volume = "E99D",

pages = "1645--1652",

journal = "IEICE Transactions on Information and Systems",

issn = "0916-8532",

publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",

number = "6",

}

TY - JOUR

T1 - Sentence similarity computational model based on information content

AU - Wu, Hao

AU - Huang, Heyan

PY - 2016/6

Y1 - 2016/6

N2 - Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-The-Art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.

AB - Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-The-Art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.

KW - Inclusionexclusion principle

KW - Information content

KW - Information retrieval

KW - Natural language processing

KW - Sentence semantic similarity

UR - http://www.scopus.com/inward/record.url?scp=85009103537&partnerID=8YFLogxK

U2 - 10.1587/transinf.2015EDP7474

DO - 10.1587/transinf.2015EDP7474

M3 - Article

AN - SCOPUS:85009103537

SN - 0916-8532

VL - E99D

SP - 1645

EP - 1652

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

IS - 6

ER -

Sentence similarity computational model based on information content

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this