Screening through a broad pool: Towards better diversity for lexically constrained text generation

Changsen Yuan; Heyan Huang; Yixin Cao; Qianwen Cao

doi:10.1016/j.ipm.2023.103602

Screening through a broad pool: Towards better diversity for lexically constrained text generation

Changsen Yuan, Heyan Huang^*, Yixin Cao, Qianwen Cao

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N-2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible.

Original language	English
Article number	103602
Journal	Information Processing and Management
Volume	61
Issue number	2
DOIs	https://doi.org/10.1016/j.ipm.2023.103602
Publication status	Published - Mar 2024

Keywords

Constrained text generation
Pre-trained language models
Randomly insert
Randomly mask
Text diversity

Access to Document

10.1016/j.ipm.2023.103602

Cite this

Yuan, C., Huang, H., Cao, Y., & Cao, Q. (2024). Screening through a broad pool: Towards better diversity for lexically constrained text generation. Information Processing and Management, 61(2), Article 103602. https://doi.org/10.1016/j.ipm.2023.103602

@article{3df8e330f0b34959875c990c69f2f059,

title = "Screening through a broad pool: Towards better diversity for lexically constrained text generation",

abstract = "Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N-2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible.",

keywords = "Constrained text generation, Pre-trained language models, Randomly insert, Randomly mask, Text diversity",

author = "Changsen Yuan and Heyan Huang and Yixin Cao and Qianwen Cao",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2024",

month = mar,

doi = "10.1016/j.ipm.2023.103602",

language = "English",

volume = "61",

journal = "Information Processing and Management",

issn = "0306-4573",

publisher = "Elsevier Ltd.",

number = "2",

}

TY - JOUR

T1 - Screening through a broad pool

T2 - Towards better diversity for lexically constrained text generation

AU - Yuan, Changsen

AU - Huang, Heyan

AU - Cao, Yixin

AU - Cao, Qianwen

PY - 2024/3

Y1 - 2024/3

N2 - Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N-2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible.

AB - Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N-2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible.

KW - Constrained text generation

KW - Pre-trained language models

KW - Randomly insert

KW - Randomly mask

KW - Text diversity

UR - http://www.scopus.com/inward/record.url?scp=85178609541&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2023.103602

DO - 10.1016/j.ipm.2023.103602

M3 - Article

AN - SCOPUS:85178609541

SN - 0306-4573

VL - 61

JO - Information Processing and Management

JF - Information Processing and Management

IS - 2

M1 - 103602

ER -

Screening through a broad pool: Towards better diversity for lexically constrained text generation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this