DSciSum: Detailed summarization of long scientific documents

Ran Liu, Xian Ling Mao*, Heyan Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

A summary is frequently considered by academics as a viable alternative to long scientific documents. Prior studies generally required well-annotated training datasets such as arXiv and PubMed, using abstracts from articles as supervised signals. However, these gold summaries merely provide a cursory overview of the subject matter, lacking crucial detailed information, such as datasets, evaluation metrics and model performance, which are essential for both academics and the general public. To address this problem, we propose DSciSum, an extract-then-generate framework that utilizes the zero-shot capabilities and a superior semantic understanding of large language models (LLMs). This approach focuses on previously overlooked details, thereby generating more human-related summaries. Moreover, an innovative LLM-based evaluation criterion is designed as a substitute for traditional metrics, providing a more meaningful and professional assessment for scientific summarization. Specifically, DSciSum first selects salient sentences containing both general and detailed information using a statistics-based heuristic approach. Thereafter, it pretrains and finetunes LLMs to acquire the generator tailored for scientific summarization. Finally, G-SciEval is designed to provide a human-related evaluation of scientific summarization from a deep semantic perspective. Experimental results show that DSciSum outperforms both the reference and state-of-the-art models on arXivCap.

Original languageEnglish
Article number113409
JournalKnowledge-Based Systems
Volume317
DOIs
Publication statusPublished - 23 May 2025

Keywords

  • Automatic text summarization
  • Evaluation
  • Large language models
  • Long document summarization

Fingerprint

Dive into the research topics of 'DSciSum: Detailed summarization of long scientific documents'. Together they form a unique fingerprint.

Cite this