Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

Fang Yang; Kunjie Fan; Dandan Song; Huakang Lin

doi:10.1186/s12859-020-03646-8

Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

Fang Yang, Kunjie Fan, Dandan Song^*, Huakang Lin

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

98 Citations (Scopus)

Abstract

Background: Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results: Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets. Conclusion: Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.

Original language	English
Article number	323
Journal	BMC Bioinformatics
Volume	21
Issue number	1
DOIs	https://doi.org/10.1186/s12859-020-03646-8
Publication status	Published - 21 Jul 2020

Keywords

Network embedding
Protein-protein interaction
Representation learning
Variational graph auto-encoder

Access to Document

10.1186/s12859-020-03646-8

Cite this

@article{7b5cf8e3768649cf8840eab8e51ac63f,

title = "Graph-based prediction of Protein-protein interactions with attributed signed graph embedding",

abstract = "Background: Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results: Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets. Conclusion: Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.",

keywords = "Network embedding, Protein-protein interaction, Representation learning, Variational graph auto-encoder",

author = "Fang Yang and Kunjie Fan and Dandan Song and Huakang Lin",

note = "Publisher Copyright: {\textcopyright} 2020 The Author(s).",

year = "2020",

month = jul,

day = "21",

doi = "10.1186/s12859-020-03646-8",

language = "English",

volume = "21",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central Ltd.",

number = "1",

}

TY - JOUR

T1 - Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

AU - Yang, Fang

AU - Fan, Kunjie

AU - Song, Dandan

AU - Lin, Huakang

PY - 2020/7/21

Y1 - 2020/7/21

N2 - Background: Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results: Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets. Conclusion: Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.

AB - Background: Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results: Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets. Conclusion: Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.

KW - Network embedding

KW - Protein-protein interaction

KW - Representation learning

KW - Variational graph auto-encoder

UR - http://www.scopus.com/inward/record.url?scp=85088514144&partnerID=8YFLogxK

U2 - 10.1186/s12859-020-03646-8

DO - 10.1186/s12859-020-03646-8

M3 - Article

C2 - 32693790

AN - SCOPUS:85088514144

SN - 1471-2105

VL - 21

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 323

ER -

Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this