TY - JOUR
T1 - Leveraging Conceptualization for Short-Text Embedding
AU - Huang, Heyan
AU - Wang, Yashen
AU - Feng, Chong
AU - Liu, Zhirun
AU - Zhou, Qiang
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Most short-text embedding models typically represent each short-text only using the literal meanings of the words, which makes these models indiscriminative for the ubiquitous polysemy. In order to enhance the semantic representation capability of the short-texts, we (i) propose a novel short-text conceptualization algorithm to assign the associated concepts for each short-text, and then (ii) introduce the conceptualization results into learning the conceptual short-text embeddings. Hence, this semantic representation is more expressive than some widely-used text representation models such as the latent topic model. Wherein, the short-text conceptualization algorithm used here is based on a novel co-ranking framework, enabling the signals (i.e., the words and the concepts) to fully interplay to derive the solid conceptualization for the short-texts. Afterwards, we further extend the conceptual short-text embedding models by utilizing an attention-based model that selects the relevant words within the context to make more efficient prediction. The experiments on the real-world datasets demonstrate that the proposed conceptual short-text embedding model and short-text conceptualization algorithm are more effective than the state-of-the-art methods.
AB - Most short-text embedding models typically represent each short-text only using the literal meanings of the words, which makes these models indiscriminative for the ubiquitous polysemy. In order to enhance the semantic representation capability of the short-texts, we (i) propose a novel short-text conceptualization algorithm to assign the associated concepts for each short-text, and then (ii) introduce the conceptualization results into learning the conceptual short-text embeddings. Hence, this semantic representation is more expressive than some widely-used text representation models such as the latent topic model. Wherein, the short-text conceptualization algorithm used here is based on a novel co-ranking framework, enabling the signals (i.e., the words and the concepts) to fully interplay to derive the solid conceptualization for the short-texts. Afterwards, we further extend the conceptual short-text embedding models by utilizing an attention-based model that selects the relevant words within the context to make more efficient prediction. The experiments on the real-world datasets demonstrate that the proposed conceptual short-text embedding model and short-text conceptualization algorithm are more effective than the state-of-the-art methods.
KW - Short-text conceptualization
KW - co-ranking framework
KW - conceptual short-text embedding
KW - semantic network
UR - http://www.scopus.com/inward/record.url?scp=85040046119&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2787709
DO - 10.1109/TKDE.2017.2787709
M3 - Article
AN - SCOPUS:85040046119
SN - 1041-4347
VL - 30
SP - 1282
EP - 1295
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 7
ER -