A Two-Phase Approach for Recognizing Tables with Complex Structures

Huichao Li, Lingze Zeng, Weiyu Zhang, Jianing Zhang, Ju Fan, Meihui Zhang*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Tables contain rich multi-dimensional information which can be an important source for many data analytics applications. However, table structure information is often unavailable in digitized documents such as PDF or image files, making it hard to perform automatic analysis over high-quality table data. Table structure recognition from digitized files is a non-trivial task, as table layouts often vary greatly in different files. Moreover, the existence of spanning cells further complicates the table structure and brings big challenges in table structure recognition. In this paper, we model the problem as a cell relation extraction task and propose T2, a novel two-phase approach that effectively recognizes table structures from digitized documents. T2 introduces a general concept termed prime relation, which captures the direct relations of cells with high confidence. It further constructs an alignment graph and employs message passing network to discover complex table structures. We validate our approach via extensive experiments over three benchmark datasets. The results demonstrate T2 is highly robust for recognizing complex table structures.

源语言英语
主期刊名Database Systems for Advanced Applications - 27th International Conference, DASFAA 2022, Proceedings
编辑Arnab Bhattacharya, Janice Lee Mong Li, Divyakant Agrawal, P. Krishna Reddy, Mukesh Mohania, Anirban Mondal, Vikram Goyal, Rage Uday Kiran
出版商Springer Science and Business Media Deutschland GmbH
587-595
页数9
ISBN(印刷版)9783031001222
DOI
出版状态已出版 - 2022
活动27th International Conference on Database Systems for Advanced Applications, DASFAA 2022 - Virtual, Online
期限: 11 4月 202214 4月 2022

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13245 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议27th International Conference on Database Systems for Advanced Applications, DASFAA 2022
Virtual, Online
时期11/04/2214/04/22

指纹

探究 'A Two-Phase Approach for Recognizing Tables with Complex Structures' 的科研主题。它们共同构成独一无二的指纹。

引用此