ProFun-SOM: Protein Function Prediction for Specific Ontology Based on Multiple Sequence Alignment Reconstruction

Jiang Yi Shao, Jun Jie Chen, Bin Liu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Protein function prediction is crucial for understanding species evolution, including viral mutations. Gene ontology (GO) is a standardized representation framework for describing protein functions with annotated terms. Each ontology is a specific functional category containing multiple child ontologies, and the relationships of parent and child ontologies create a directed acyclic graph. Protein functions are categorized using GO, which divides them into three main groups: cellular component ontology, molecular function ontology, and biological process ontology. Therefore, the GO annotation of protein is a hierarchical multilabel classification problem. This hierarchical relationship introduces complexities such as mixed ontology problem, leading to performance bottlenecks in existing computational methods due to label dependency and data sparsity. To overcome bottleneck issues brought by mixed ontology problem, we propose ProFun-SOM, an innovative multilabel classifier that utilizes multiple sequence alignments (MSAs) to accurately annotate gene ontologies. ProFun-SOM enhances the initial MSAs through a reconstruction process and integrates them into a deep learning architecture. It then predicts annotations within the cellular component, molecular function, biological process, and mixed ontologies. Our evaluation results on three datasets (CAFA3, SwissProt, and NetGO2) demonstrate that ProFun-SOM surpasses state-of-the-art methods. This study confirmed that utilizing MSAs of proteins can effectively overcome the two main bottlenecks issues, label dependency and data sparsity, thereby alleviating the root problem, mixed ontology.

Original languageEnglish
Pages (from-to)8060-8071
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume36
Issue number5
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Gene ontology (GO) annotation
  • multilabel classification
  • multiple sequence alignment (MSA) reconstruction
  • protein function prediction
  • specific ontology predictor

Fingerprint

Dive into the research topics of 'ProFun-SOM: Protein Function Prediction for Specific Ontology Based on Multiple Sequence Alignment Reconstruction'. Together they form a unique fingerprint.

Cite this