Abstract
Protein annotation has long been a challenging task in computational biology.Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG).In this study,we propose a novel GO term representation learning method,PO2Vec,to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.
Original language | English |
---|---|
Article number | bbae077 |
Journal | Briefings in Bioinformatics |
Volume | 25 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Mar 2024 |
Keywords
- Gene Ontology
- partial order constraint
- protein annotation
- protein function prediction
- representation learning