Skip to main navigation Skip to search Skip to main content

TabAgent: A Multi-Agent Table Extraction Framework for Unstructured Documents

  • Jingfei Wu
  • , Junyi Han
  • , Yujin Gao*
  • *Corresponding author for this work
  • Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the increasing amount of unstructured documents in various domains, extracting structured data from such sources has become critical for efficient data management and advanced analytics. However, existing methods for structured table extraction face challenges: (1) the complexity and implicitness of semantic context and patterns in unstructured documents hinder accurate information extraction, and (2) limited adaptability to evolving user intents and requirements for strict schema alignment and structural constraints. To address these limitations, we propose TabAgent, a novel multi-agent collaborative framework for structured table extraction from unstructured documents. TabAgent integrates four specialized agents, Schema Agent, Extraction Agent, Semantic Agent, and Validation Agent with a shared memory repository to iteratively refine extraction results. By leveraging collaborative reasoning and iterative self-correcting loops, TabAgent enables accurate, adaptive, and robust table extraction across diverse document domains and user instructions. Extensive experiments on four datasets demonstrate that TabAgent consistently outperforms several baselines, including pure LLM extractors and LLM-based systems, highlighting the effectiveness of this collaborative framework. Our work represents one of the first multi-agent frameworks for structured table extraction, offering an applicable solution for real-world applications.

Original languageEnglish
Title of host publicationProceedings of 2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages600-607
Number of pages8
ISBN (Electronic)9798331569921
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025 - Guiyang, China
Duration: 26 Dec 202528 Dec 2025

Publication series

NameProceedings of 2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025

Conference

Conference2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025
Country/TerritoryChina
CityGuiyang
Period26/12/2528/12/25

Keywords

  • Large Language Models
  • Multi-agent Systems
  • Structured Information Extraction

Fingerprint

Dive into the research topics of 'TabAgent: A Multi-Agent Table Extraction Framework for Unstructured Documents'. Together they form a unique fingerprint.

Cite this