Improved Chinese sentence semantic similarity calculation method based on multi-feature fusion

Liqi Liu; Qinglin Wang; Yuan Li

doi:10.20965/JACIII.2021.P0442

Improved Chinese sentence semantic similarity calculation method based on multi-feature fusion

Liqi Liu, Qinglin Wang, Yuan Li^*

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

In this paper, an improved long short-term memory (LSTM)-based deep neural network structure is proposed for learning variable-length Chinese sentence semantic similarities. Siamese LSTM, a sequence-insensitive deep neural network model, has a limited ability to capture the semantics of natural language because it has difficulty explaining semantic differences based on the differences in syntactic structures or word order in a sentence. Therefore, the proposed model integrates the syntactic component features of the words in the sentence into a word vector representation layer to express the syntactic structure information of the sentence and the interdependence between words. Moreover, a relative position embedding layer is introduced into the model, and the relative position of the words in the sentence is mapped to a high-dimensional space to capture the local position information of the words. With this model, a parallel structure is used to map two sentences into the same high-dimensional space to obtain a fixed-length sentence vector representation. After aggregation, the sentence similarity is computed in the output layer. Experiments with Chinese sentences show that the model can achieve good results in the calculation of the semantic similarity.

Original language	English
Pages (from-to)	442-449
Number of pages	8
Journal	Journal of Advanced Computational Intelligence and Intelligent Informatics
Volume	25
Issue number	4
DOIs	https://doi.org/10.20965/JACIII.2021.P0442
Publication status	Published - Jul 2021

Keywords

LSTM
Relative position embedding
Semantic similarity
Syntactic component

Access to Document

10.20965/JACIII.2021.P0442

Cite this

@article{74514ede9a0e4fe4b38274f715c9cb4c,

title = "Improved Chinese sentence semantic similarity calculation method based on multi-feature fusion",

abstract = "In this paper, an improved long short-term memory (LSTM)-based deep neural network structure is proposed for learning variable-length Chinese sentence semantic similarities. Siamese LSTM, a sequence-insensitive deep neural network model, has a limited ability to capture the semantics of natural language because it has difficulty explaining semantic differences based on the differences in syntactic structures or word order in a sentence. Therefore, the proposed model integrates the syntactic component features of the words in the sentence into a word vector representation layer to express the syntactic structure information of the sentence and the interdependence between words. Moreover, a relative position embedding layer is introduced into the model, and the relative position of the words in the sentence is mapped to a high-dimensional space to capture the local position information of the words. With this model, a parallel structure is used to map two sentences into the same high-dimensional space to obtain a fixed-length sentence vector representation. After aggregation, the sentence similarity is computed in the output layer. Experiments with Chinese sentences show that the model can achieve good results in the calculation of the semantic similarity.",

keywords = "LSTM, Relative position embedding, Semantic similarity, Syntactic component",

author = "Liqi Liu and Qinglin Wang and Yuan Li",

note = "Publisher Copyright: {\textcopyright} Fuji Technology Press Ltd.",

year = "2021",

month = jul,

doi = "10.20965/JACIII.2021.P0442",

language = "English",

volume = "25",

pages = "442--449",

journal = "Journal of Advanced Computational Intelligence and Intelligent Informatics",

issn = "1343-0130",

publisher = "Fuji Technology Press",

number = "4",

}

TY - JOUR

T1 - Improved Chinese sentence semantic similarity calculation method based on multi-feature fusion

AU - Liu, Liqi

AU - Wang, Qinglin

AU - Li, Yuan

N1 - Publisher Copyright: © Fuji Technology Press Ltd.

PY - 2021/7

Y1 - 2021/7

N2 - In this paper, an improved long short-term memory (LSTM)-based deep neural network structure is proposed for learning variable-length Chinese sentence semantic similarities. Siamese LSTM, a sequence-insensitive deep neural network model, has a limited ability to capture the semantics of natural language because it has difficulty explaining semantic differences based on the differences in syntactic structures or word order in a sentence. Therefore, the proposed model integrates the syntactic component features of the words in the sentence into a word vector representation layer to express the syntactic structure information of the sentence and the interdependence between words. Moreover, a relative position embedding layer is introduced into the model, and the relative position of the words in the sentence is mapped to a high-dimensional space to capture the local position information of the words. With this model, a parallel structure is used to map two sentences into the same high-dimensional space to obtain a fixed-length sentence vector representation. After aggregation, the sentence similarity is computed in the output layer. Experiments with Chinese sentences show that the model can achieve good results in the calculation of the semantic similarity.

AB - In this paper, an improved long short-term memory (LSTM)-based deep neural network structure is proposed for learning variable-length Chinese sentence semantic similarities. Siamese LSTM, a sequence-insensitive deep neural network model, has a limited ability to capture the semantics of natural language because it has difficulty explaining semantic differences based on the differences in syntactic structures or word order in a sentence. Therefore, the proposed model integrates the syntactic component features of the words in the sentence into a word vector representation layer to express the syntactic structure information of the sentence and the interdependence between words. Moreover, a relative position embedding layer is introduced into the model, and the relative position of the words in the sentence is mapped to a high-dimensional space to capture the local position information of the words. With this model, a parallel structure is used to map two sentences into the same high-dimensional space to obtain a fixed-length sentence vector representation. After aggregation, the sentence similarity is computed in the output layer. Experiments with Chinese sentences show that the model can achieve good results in the calculation of the semantic similarity.

KW - LSTM

KW - Relative position embedding

KW - Semantic similarity

KW - Syntactic component

UR - http://www.scopus.com/inward/record.url?scp=85111490014&partnerID=8YFLogxK

U2 - 10.20965/JACIII.2021.P0442

DO - 10.20965/JACIII.2021.P0442

M3 - Article

AN - SCOPUS:85111490014

SN - 1343-0130

VL - 25

SP - 442

EP - 449

JO - Journal of Advanced Computational Intelligence and Intelligent Informatics

JF - Journal of Advanced Computational Intelligence and Intelligent Informatics

IS - 4

ER -

Improved Chinese sentence semantic similarity calculation method based on multi-feature fusion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this