跳到主要导航 跳到搜索 跳到主要内容

Tapas: enabling faithful data-to-text generation through task-adaptive pre-training with data alignment strategy

  • Xin Sun*
  • , Haoran Zhang
  • , Shuo Zhao
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications
  • China Mobile Research Institute

科研成果: 期刊稿件文章同行评审

摘要

Data-to-text generation is the task of converting structured data into human-readable and coherent text, with applications in fields such as automated reporting and real-time information dissemination. Despite recent progress with pre-trained language models, which have significantly improved human-readability and coherence, a major challenge remains: hallucination, where generated text fails to faithfully align with the input data. These hallucinations primarily stem from two factors: limitations in the model's ability to understand the structural information of the data, and inconsistencies between structured data and reference texts in the training data. To address these challenges, we propose Tapas, a task-adaptive pre-training model that mitigates hallucination from both the model and data perspectives. First, we employ task-adaptive pre-training with three effective learning objectives. This aims to enhance the ability of pre-trained language models to learn data structure and align structured data with reference texts. Then, during the fine-tuning phase, we incorporate a heuristic data alignment strategy to further mitigate hallucination. Experimental results indicate that Tapas achieves the state-of-the-art BLEU-4 scores on the E2E and WebNLG datasets in fully supervised scenarios. In few-shot scenarios, notable improvements of 2.1 % and 1.8 % are observed for E2E and WebNLG, respectively. These results confirm Tapas’ effectiveness in addressing the core causes of hallucination and improving fidelity in data-to-text generation compared to baseline models.

源语言英语
文章编号114240
期刊Knowledge-Based Systems
328
DOI
出版状态已出版 - 25 10月 2025
已对外发布

指纹

探究 'Tapas: enabling faithful data-to-text generation through task-adaptive pre-training with data alignment strategy' 的科研主题。它们共同构成独一无二的指纹。

引用此