Skip to main navigation Skip to search Skip to main content

Bio-RFX: Refining Biomedical Extraction via Advanced Relation Classification and Structural Constraints

  • Minjia Wang
  • , Fangzhou Liu
  • , Xiuxing Li*
  • , Bowen Dong
  • , Zhenyu Li
  • , Tengyu Pan
  • , Jianyong Wang*
  • *Corresponding author for this work
  • Tsinghua University
  • Harvard University
  • Nanjing University of Aeronautics and Astronautics

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The ever-growing biomedical publications magnify the challenge of extracting structured data from unstructured texts. This task involves two components: biomedical entity identification (Named Entity Recognition, NER) and their interrelation determination (Relation Extraction, RE). However, existing methods often neglect unique features of the biomedical literature, such as ambiguous entities, nested proper nouns, and overlapping relation triplets, and underutilize prior knowledge, leading to an intolerable performance decline in the biomedical domain, especially with limited annotated training data. In this paper, we propose the Biomedical Relation-First eXtraction (Bio-RFX) model by leveraging sentence-level relation classification before entity extraction to tackle entity ambiguity. Moreover, we exploit structural constraints between entities and relations to guide the model's hypothesis space, enhancing extraction performance across different training scenarios. Comprehensive experimental results on biomedical datasets show that Bio-RFX achieves significant improvements on both NER and RE tasks. Even under the low-resource training scenarios, it outperforms all baselines in NER and has highly competitive performance compared to the state-of-the-art fine-tuned baselines in RE.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages10524-10539
Number of pages16
ISBN (Electronic)9798891761643
DOIs
Publication statusPublished - 2024
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Fingerprint

Dive into the research topics of 'Bio-RFX: Refining Biomedical Extraction via Advanced Relation Classification and Structural Constraints'. Together they form a unique fingerprint.

Cite this