A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data

Xu Yang, Meihui Zhang*, Ju Fan, Zeyu Luo, Yuxin Yang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Tabular data in scientific papers provides valuable structured information for knowledge discovery and validation. Although the language models such as BERT and ChatGPT have significantly advanced the research on general domain tables, challenges remain in scientific tables. Specifically, such models have limitations in understanding scientific entities, as well as lacks numerical representation and computation capabilities. Previous studies have focused on scientific tables, but they are limited to individual modules or tasks and lack a comprehensive framework. To address these issues, we introduce a reading comprehension framework for scientific tables, named NRTR, which uses a multi-task learning approach that shares a common encoder, achieves reasoning across various tasks, including question answering, cloze testing, and fact verification. It has the following characteristics: (1) utilizing entity linking and named entity recognition to extract key information from papers, which enhances the models' understanding of scientific entities; (2) injecting numerical representation capabilities into language models and promoting the model's understanding of the relative magnitude of numbers to better reason about maximum and difference values. Notably, the existing scientific corpus lacks tabular contexts or does not integrate computational reasoning, which hinders the evaluation of reasoning models in scientific tables. To this end, we release SciTab, a multi-task dataset that merges high-quality scientific tables with contextual information to provide a benchmark for future research. Our experimental results show that NRTR outperforms existing models on SciTab.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PublisherIEEE Computer Society
Pages3710-3724
Number of pages15
ISBN (Electronic)9798350317152
DOIs
Publication statusPublished - 2024
Event40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, Netherlands
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference40th IEEE International Conference on Data Engineering, ICDE 2024
Country/TerritoryNetherlands
CityUtrecht
Period13/05/2417/05/24

Keywords

  • contextual information
  • multi-task learning
  • numerical representation
  • scientific domain
  • tabular data

Fingerprint

Dive into the research topics of 'A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data'. Together they form a unique fingerprint.

Cite this