Active learning algorithm for threshold of decision probability on imbalanced text classification based on protein-protein interaction documents

  • Guixian Xu*
  • , Zhendong Niu
  • , Xu Gao
  • , Yujuan Cao
  • , Yumin Zhao
  • *Corresponding author for this work
3 Citations (Scopus)

Abstract

The study of host pathogen protein-protein interactions (PPIs) is essential to understand the disease-causing mechanisms of human pathogens. A large number of scientific findings about PPIs are generated in the biomedical literatures. Building a document classification system can accelerate the process of mining and curation of PPI knowledge. With more and more imbalanced dataset appearing, how to handle the imbalanced classification problem is becoming a hot topic in machine learning field. In this paper, we propose an Active Learning algorithm for Threshold of Decision Probability (ALTDP) to solve problem of misclassifying the minority class based on imbalanced host pathogen PPIs data set. The results demonstrate the proposed approach is significant to improve the accuracy of classification on imbalanced data set.

Original languageEnglish
Title of host publicationDSDE 2010 - International Conference on Data Storage and Data Engineering
Pages78-82
Number of pages5
DOIs
Publication statusPublished - 2010
Event2010 International Conference on Data Storage and Data Engineering, DSDE 2010 - Bangalore, India
Duration: 9 Feb 2010 β†’ 10 Feb 2010

Publication series

NameDSDE 2010 - International Conference on Data Storage and Data Engineering

Conference

Conference2010 International Conference on Data Storage and Data Engineering, DSDE 2010
Country/TerritoryIndia
CityBangalore
Period9/02/10 β†’ 10/02/10

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Imbalanced text classification
  • Machine learning
  • Protein-protein interaction

Fingerprint

Dive into the research topics of 'Active learning algorithm for threshold of decision probability on imbalanced text classification based on protein-protein interaction documents'. Together they form a unique fingerprint.

Cite this