TY - GEN
T1 - A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data
AU - Yang, Xu
AU - Zhang, Meihui
AU - Fan, Ju
AU - Luo, Zeyu
AU - Yang, Yuxin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Tabular data in scientific papers provides valuable structured information for knowledge discovery and validation. Although the language models such as BERT and ChatGPT have significantly advanced the research on general domain tables, challenges remain in scientific tables. Specifically, such models have limitations in understanding scientific entities, as well as lacks numerical representation and computation capabilities. Previous studies have focused on scientific tables, but they are limited to individual modules or tasks and lack a comprehensive framework. To address these issues, we introduce a reading comprehension framework for scientific tables, named NRTR, which uses a multi-task learning approach that shares a common encoder, achieves reasoning across various tasks, including question answering, cloze testing, and fact verification. It has the following characteristics: (1) utilizing entity linking and named entity recognition to extract key information from papers, which enhances the models' understanding of scientific entities; (2) injecting numerical representation capabilities into language models and promoting the model's understanding of the relative magnitude of numbers to better reason about maximum and difference values. Notably, the existing scientific corpus lacks tabular contexts or does not integrate computational reasoning, which hinders the evaluation of reasoning models in scientific tables. To this end, we release SciTab, a multi-task dataset that merges high-quality scientific tables with contextual information to provide a benchmark for future research. Our experimental results show that NRTR outperforms existing models on SciTab.
AB - Tabular data in scientific papers provides valuable structured information for knowledge discovery and validation. Although the language models such as BERT and ChatGPT have significantly advanced the research on general domain tables, challenges remain in scientific tables. Specifically, such models have limitations in understanding scientific entities, as well as lacks numerical representation and computation capabilities. Previous studies have focused on scientific tables, but they are limited to individual modules or tasks and lack a comprehensive framework. To address these issues, we introduce a reading comprehension framework for scientific tables, named NRTR, which uses a multi-task learning approach that shares a common encoder, achieves reasoning across various tasks, including question answering, cloze testing, and fact verification. It has the following characteristics: (1) utilizing entity linking and named entity recognition to extract key information from papers, which enhances the models' understanding of scientific entities; (2) injecting numerical representation capabilities into language models and promoting the model's understanding of the relative magnitude of numbers to better reason about maximum and difference values. Notably, the existing scientific corpus lacks tabular contexts or does not integrate computational reasoning, which hinders the evaluation of reasoning models in scientific tables. To this end, we release SciTab, a multi-task dataset that merges high-quality scientific tables with contextual information to provide a benchmark for future research. Our experimental results show that NRTR outperforms existing models on SciTab.
KW - contextual information
KW - multi-task learning
KW - numerical representation
KW - scientific domain
KW - tabular data
UR - http://www.scopus.com/inward/record.url?scp=85200439037&partnerID=8YFLogxK
U2 - 10.1109/ICDE60146.2024.00285
DO - 10.1109/ICDE60146.2024.00285
M3 - Conference contribution
AN - SCOPUS:85200439037
T3 - Proceedings - International Conference on Data Engineering
SP - 3710
EP - 3724
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
Y2 - 13 May 2024 through 17 May 2024
ER -