TY - GEN
T1 - Hierarchical text-label integrated attention network for document classification
AU - Gong, Changjin
AU - Shi, Kaize
AU - Niu, Zhendong
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/22
Y1 - 2019/6/22
N2 - Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been extensively used on text classification to capture the local and long-range dependencies. Recent work has demonstrated the superiority of self-attention networks (SAN) owing to their highly parallelizable computation and excellent performance. However, SAN has difficulty capturing meaningful semantic relationships over very long sequences, and the memory requirement grows rapidly in line with the sequence length. To solve these limitations of SAN in processing long document sequence, this paper proposes four novel ideas and build a hierarchical text-label integrated attention network(HLAN). Firstly, a hierarchical architecture is introduced to map the hierarchy of document, which effectively shortens the sequence length of each process. Secondly, the attention weights are calculated in the joint embedding space of text and label. Thirdly, a multi-head soft attention is proposed to compress the sequence encoded by self-attention into a single vector. Finally, a loss term called class loss is given and combined with cross entropy loss. HLAN achieves competitive results over the compared strong baseline methods on 4 out of 5 benchmark datasets, which verifies the effectiveness of HLAN for document classification, in terms of both accuracy and memory requirement.
AB - Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been extensively used on text classification to capture the local and long-range dependencies. Recent work has demonstrated the superiority of self-attention networks (SAN) owing to their highly parallelizable computation and excellent performance. However, SAN has difficulty capturing meaningful semantic relationships over very long sequences, and the memory requirement grows rapidly in line with the sequence length. To solve these limitations of SAN in processing long document sequence, this paper proposes four novel ideas and build a hierarchical text-label integrated attention network(HLAN). Firstly, a hierarchical architecture is introduced to map the hierarchy of document, which effectively shortens the sequence length of each process. Secondly, the attention weights are calculated in the joint embedding space of text and label. Thirdly, a multi-head soft attention is proposed to compress the sequence encoded by self-attention into a single vector. Finally, a loss term called class loss is given and combined with cross entropy loss. HLAN achieves competitive results over the compared strong baseline methods on 4 out of 5 benchmark datasets, which verifies the effectiveness of HLAN for document classification, in terms of both accuracy and memory requirement.
KW - Class loss
KW - Document classification
KW - Hierarchical
KW - Memory requirement
KW - Self-attention networks
KW - Textlabel integrated
UR - http://www.scopus.com/inward/record.url?scp=85071599806&partnerID=8YFLogxK
U2 - 10.1145/3341069.3342987
DO - 10.1145/3341069.3342987
M3 - Conference contribution
AN - SCOPUS:85071599806
T3 - ACM International Conference Proceeding Series
SP - 254
EP - 260
BT - HPCCT 2019 - 3rd High Performance Computing and Cluster Technologies Conference and BDAI 2019 - 2nd International Conference on Big Data and Artificial Intelligence
PB - Association for Computing Machinery
T2 - 3rd High Performance Computing and Cluster Technologies Conference, HPCCT 2019 and the 2nd International Conference on Big Data and Artificial Intelligence, BDAI 2019
Y2 - 22 June 2019 through 24 June 2019
ER -