Partial order relation-based gene ontology embedding improves protein function prediction

Wenjing Li, Bin Wang, Jin Dai, Yan Kou, Xiaojun Chen*, Yi Pan, Shuangwei Hu*, Zhenjiang Zech Xu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Protein annotation has long been a challenging task in computational biology.Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG).In this study,we propose a novel GO term representation learning method,PO2Vec,to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.

Original languageEnglish
Article numberbbae077
JournalBriefings in Bioinformatics
Volume25
Issue number2
DOIs
Publication statusPublished - 1 Mar 2024

Keywords

  • Gene Ontology
  • partial order constraint
  • protein annotation
  • protein function prediction
  • representation learning

Fingerprint

Dive into the research topics of 'Partial order relation-based gene ontology embedding improves protein function prediction'. Together they form a unique fingerprint.

Cite this