TY - GEN
T1 - TabAgent
T2 - 2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025
AU - Wu, Jingfei
AU - Han, Junyi
AU - Gao, Yujin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - With the increasing amount of unstructured documents in various domains, extracting structured data from such sources has become critical for efficient data management and advanced analytics. However, existing methods for structured table extraction face challenges: (1) the complexity and implicitness of semantic context and patterns in unstructured documents hinder accurate information extraction, and (2) limited adaptability to evolving user intents and requirements for strict schema alignment and structural constraints. To address these limitations, we propose TabAgent, a novel multi-agent collaborative framework for structured table extraction from unstructured documents. TabAgent integrates four specialized agents, Schema Agent, Extraction Agent, Semantic Agent, and Validation Agent with a shared memory repository to iteratively refine extraction results. By leveraging collaborative reasoning and iterative self-correcting loops, TabAgent enables accurate, adaptive, and robust table extraction across diverse document domains and user instructions. Extensive experiments on four datasets demonstrate that TabAgent consistently outperforms several baselines, including pure LLM extractors and LLM-based systems, highlighting the effectiveness of this collaborative framework. Our work represents one of the first multi-agent frameworks for structured table extraction, offering an applicable solution for real-world applications.
AB - With the increasing amount of unstructured documents in various domains, extracting structured data from such sources has become critical for efficient data management and advanced analytics. However, existing methods for structured table extraction face challenges: (1) the complexity and implicitness of semantic context and patterns in unstructured documents hinder accurate information extraction, and (2) limited adaptability to evolving user intents and requirements for strict schema alignment and structural constraints. To address these limitations, we propose TabAgent, a novel multi-agent collaborative framework for structured table extraction from unstructured documents. TabAgent integrates four specialized agents, Schema Agent, Extraction Agent, Semantic Agent, and Validation Agent with a shared memory repository to iteratively refine extraction results. By leveraging collaborative reasoning and iterative self-correcting loops, TabAgent enables accurate, adaptive, and robust table extraction across diverse document domains and user instructions. Extensive experiments on four datasets demonstrate that TabAgent consistently outperforms several baselines, including pure LLM extractors and LLM-based systems, highlighting the effectiveness of this collaborative framework. Our work represents one of the first multi-agent frameworks for structured table extraction, offering an applicable solution for real-world applications.
KW - Large Language Models
KW - Multi-agent Systems
KW - Structured Information Extraction
UR - https://www.scopus.com/pages/publications/105036858396
U2 - 10.1109/AIBDF67964.2025.11440749
DO - 10.1109/AIBDF67964.2025.11440749
M3 - Conference contribution
AN - SCOPUS:105036858396
T3 - Proceedings of 2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025
SP - 600
EP - 607
BT - Proceedings of 2025 5th International Symposium on Artificial Intelligence and Big Data, AIBDF 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 December 2025 through 28 December 2025
ER -