A Two-Phase Approach for Recognizing Tables with Complex Structures

Huichao Li, Lingze Zeng, Weiyu Zhang, Jianing Zhang, Ju Fan, Meihui Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Tables contain rich multi-dimensional information which can be an important source for many data analytics applications. However, table structure information is often unavailable in digitized documents such as PDF or image files, making it hard to perform automatic analysis over high-quality table data. Table structure recognition from digitized files is a non-trivial task, as table layouts often vary greatly in different files. Moreover, the existence of spanning cells further complicates the table structure and brings big challenges in table structure recognition. In this paper, we model the problem as a cell relation extraction task and propose T2, a novel two-phase approach that effectively recognizes table structures from digitized documents. T2 introduces a general concept termed prime relation, which captures the direct relations of cells with high confidence. It further constructs an alignment graph and employs message passing network to discover complex table structures. We validate our approach via extensive experiments over three benchmark datasets. The results demonstrate T2 is highly robust for recognizing complex table structures.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 27th International Conference, DASFAA 2022, Proceedings
EditorsArnab Bhattacharya, Janice Lee Mong Li, Divyakant Agrawal, P. Krishna Reddy, Mukesh Mohania, Anirban Mondal, Vikram Goyal, Rage Uday Kiran
PublisherSpringer Science and Business Media Deutschland GmbH
Pages587-595
Number of pages9
ISBN (Print)9783031001222
DOIs
Publication statusPublished - 2022
Event27th International Conference on Database Systems for Advanced Applications, DASFAA 2022 - Virtual, Online
Duration: 11 Apr 202214 Apr 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13245 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Database Systems for Advanced Applications, DASFAA 2022
CityVirtual, Online
Period11/04/2214/04/22

Keywords

  • Data mining
  • Message passing networks
  • Table structure recognition

Fingerprint

Dive into the research topics of 'A Two-Phase Approach for Recognizing Tables with Complex Structures'. Together they form a unique fingerprint.

Cite this