Toward Balanced Denoising: Building a Structural and Textual Denoiser for Table Understanding

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, large language models (LLMs) have made remarkable progress in table understanding, yet they remain vulnerable to the structural noise (SN) and the textual noise (TN). Existing methods usually employ biased denoising strategies such as structural matching and textual filtering, or overzealous denoising strategies such as introducing supplementary tasks like text-to-SQL and table-to-text to reduce these two types of noise. However, these methods either neglect one type of noise or introduce substantial external noise. Therefore, how to simultaneously mitigate the structural and textual noise without introducing extra noise and improve the performance of LLMs in table understanding is still an unresolved issue. In this paper, we rethink the bottlenecks in table understanding from the perspective of noise reduction and propose a novel dual-denoiser-reasoner model, called TabDDR, for balanced and effective denoising. Specially, our model consists of a structural-and-textual denoiser and a task-adaptive reasoner. The former removes two types of noise via triplet alignment and planning extraction to seek an interpretable balance between breaking structural barriers and preserving structural characteristics, eliminating textual noise and retaining maximal information; the latter ensures a simple but effective reasoning process which can adapt to various downstream tasks. To highlight the presence and impact of the structural and textual noise, we construct the WTQ-SN and WTQ-TN datasets based on the WikiTableQuestion (WTQ) dataset. Extensive experiments on these self-constructed datasets and two other public datasets demonstrate that our proposed method performs better than state-of-the-art baselines.

Original languageEnglish
Pages (from-to)7414-7425
Number of pages12
JournalIEEE Transactions on Knowledge and Data Engineering
Volume37
Issue number12
DOIs
Publication statusPublished - 2025

Keywords

  • Table understanding
  • dual-denoiser-reasoner
  • large language models
  • structural noise
  • table-based fact verification
  • table-based question answering
  • textual noise

Fingerprint

Dive into the research topics of 'Toward Balanced Denoising: Building a Structural and Textual Denoiser for Table Understanding'. Together they form a unique fingerprint.

Cite this