摘要
Protein annotation has long been a challenging task in computational biology.Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG).In this study,we propose a novel GO term representation learning method,PO2Vec,to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.
源语言 | 英语 |
---|---|
文章编号 | bbae077 |
期刊 | Briefings in Bioinformatics |
卷 | 25 |
期 | 2 |
DOI | |
出版状态 | 已出版 - 1 3月 2024 |