TY - GEN
T1 - Which Words Pillar the Semantic Expression of a Sentence?
AU - Zhang, Cheng
AU - Cao, Jingxu
AU - Yan, Dongmei
AU - Song, Dawei
AU - Lv, Jinxin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In the realm of machine learning, a profound understanding of sentence semantics holds paramount importance for various applications, notably text classification. Traditionally, this comprehension has been entrusted to deep learning models, despite their computationally intensive nature, particularly when dealing with lengthy sequences. The nuanced impact of individual words within a sentence on semantic expression necessitates a strategic removal of less pertinent words to alleviate the computational burden of the model. Presently, prevailing approaches for word removal predominantly employ methods such as truncation, stop-word elimination and attention mechanisms. Regrettably, these techniques often lack a robust theoretical foundation concerning semantics and interpretability. To bridge this conceptual gap, our study introduces the concept of 'Semantic Pillar Words' (SPW) within a sentence, anchored in a Semantic Euclidean space. Here, the semantics of a word are represented as a constellation of semantic points, with a text sequence encapsulating the convex hull of these semantic points of words. We propose a novel method for Semantic Pillar Word extraction, known as 'SPW-Conv', which dynamically and interpretably prunes text segments, striving to preserve the semantic pillars inherent in the original text. Our extensive experimentation encompasses three diverse text classification datasets, revealing that SPW-Conv outperforms existing methods. Remarkably, it becomes evident that retaining less than 80% of the words within a sentence suffices to capture its semantics adequately, all while achieving classification accuracy levels comparable to those obtained using the entire original text.
AB - In the realm of machine learning, a profound understanding of sentence semantics holds paramount importance for various applications, notably text classification. Traditionally, this comprehension has been entrusted to deep learning models, despite their computationally intensive nature, particularly when dealing with lengthy sequences. The nuanced impact of individual words within a sentence on semantic expression necessitates a strategic removal of less pertinent words to alleviate the computational burden of the model. Presently, prevailing approaches for word removal predominantly employ methods such as truncation, stop-word elimination and attention mechanisms. Regrettably, these techniques often lack a robust theoretical foundation concerning semantics and interpretability. To bridge this conceptual gap, our study introduces the concept of 'Semantic Pillar Words' (SPW) within a sentence, anchored in a Semantic Euclidean space. Here, the semantics of a word are represented as a constellation of semantic points, with a text sequence encapsulating the convex hull of these semantic points of words. We propose a novel method for Semantic Pillar Word extraction, known as 'SPW-Conv', which dynamically and interpretably prunes text segments, striving to preserve the semantic pillars inherent in the original text. Our extensive experimentation encompasses three diverse text classification datasets, revealing that SPW-Conv outperforms existing methods. Remarkably, it becomes evident that retaining less than 80% of the words within a sentence suffices to capture its semantics adequately, all while achieving classification accuracy levels comparable to those obtained using the entire original text.
KW - Convex hull
KW - Natural Language Processing
KW - Semantic Pillar Words
UR - http://www.scopus.com/inward/record.url?scp=85182404786&partnerID=8YFLogxK
U2 - 10.1109/ICTAI59109.2023.00121
DO - 10.1109/ICTAI59109.2023.00121
M3 - Conference contribution
AN - SCOPUS:85182404786
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 791
EP - 798
BT - Proceedings - 2023 IEEE 35th International Conference on Tools with Artificial Intelligence, ICTAI 2023
PB - IEEE Computer Society
T2 - 35th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2023
Y2 - 6 November 2023 through 8 November 2023
ER -