TY - JOUR
T1 - Introducing bidirectional attention for autoregressive models in abstractive summarization
AU - Zhao, Jianfei
AU - Sun, Xin
AU - Feng, Chong
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2025/1
Y1 - 2025/1
N2 - Abstractive summarization methods typically follow the autoregressive paradigm using the causal masks in the decoder for training and inference efficiency. However, this approach leads to a constant context throughout the generation process, which conflicting with the bidirectional characteristics of natural language. Although previous attempts have been made to incorporate bidirectional attention in the decoding process through non-autoregressive approach, the evaluation results are not comparable to the autoregressive methods. To bring bidirectional attention to the autoregressive process while maintaining superior performance, we propose the global autoregressive paradigm, which takes the outputs of the autoregressive process as additional inputs in the subsequent global iteration. Specifically, we build a bidirectional decoder alongside the original encoder and decoder to capture the bidirectional context of the outputs. This context is updated after each autoregressive decoding iteration. The decoder then integrates the updated context into subsequent autoregressive decoding steps, enhancing the generative process with a more comprehensive and authentic context. Additionally, we use contrastive learning to train the model to extract reliable features from the bidirectional context and apply reinforcement learning to improve the model's utilization of this context. We evaluate our method on CNN/DM, XSum, and NYT datasets, and the results highlight the significance of the bidirectional context. Our method achieves the best performance in terms of ROUGE-2 on CNN/DM (23.96), and performs comparably on XSum (25.45) and NYT (27.91). It also outperforms all the baselines in terms of BERTScore, with a score of 89.96 on CNN/DM, 92.70 on XSum, and 90.04 on NYT. Furthermore, our method can perform better with a larger beam size.
AB - Abstractive summarization methods typically follow the autoregressive paradigm using the causal masks in the decoder for training and inference efficiency. However, this approach leads to a constant context throughout the generation process, which conflicting with the bidirectional characteristics of natural language. Although previous attempts have been made to incorporate bidirectional attention in the decoding process through non-autoregressive approach, the evaluation results are not comparable to the autoregressive methods. To bring bidirectional attention to the autoregressive process while maintaining superior performance, we propose the global autoregressive paradigm, which takes the outputs of the autoregressive process as additional inputs in the subsequent global iteration. Specifically, we build a bidirectional decoder alongside the original encoder and decoder to capture the bidirectional context of the outputs. This context is updated after each autoregressive decoding iteration. The decoder then integrates the updated context into subsequent autoregressive decoding steps, enhancing the generative process with a more comprehensive and authentic context. Additionally, we use contrastive learning to train the model to extract reliable features from the bidirectional context and apply reinforcement learning to improve the model's utilization of this context. We evaluate our method on CNN/DM, XSum, and NYT datasets, and the results highlight the significance of the bidirectional context. Our method achieves the best performance in terms of ROUGE-2 on CNN/DM (23.96), and performs comparably on XSum (25.45) and NYT (27.91). It also outperforms all the baselines in terms of BERTScore, with a score of 89.96 on CNN/DM, 92.70 on XSum, and 90.04 on NYT. Furthermore, our method can perform better with a larger beam size.
KW - Abstractive summarization
KW - Autoregressive model
KW - Contrastive learning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85204768944&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2024.121497
DO - 10.1016/j.ins.2024.121497
M3 - Article
AN - SCOPUS:85204768944
SN - 0020-0255
VL - 689
JO - Information Sciences
JF - Information Sciences
M1 - 121497
ER -