DTransE: Distributed Translating Embedding for Knowledge Graph

Dandan Song; Feng Zhang; Meiyan Lu; Sicheng Yang; Heyan Huang

doi:10.1109/TPDS.2021.3066442

DTransE: Distributed Translating Embedding for Knowledge Graph

Dandan Song^*, Feng Zhang, Meiyan Lu, Sicheng Yang, Heyan Huang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

17 Citations (Scopus)

Abstract

Knowledge graphs play an important role in many applications, such as link prediction and question answering. Translating embedding for knowledge graphs is done with the aim of encoding structured information on entities and their rich relations in a low-dimensional embedding space. TransE is one of the most important methods in translation-based models, and uses translation invariance to implement translating embedding for knowledge graphs. In this line of work, translating embedding models represent the relation as a translation from the head entity to the tail entity and have achieved impressive results. Currently, the TransE model is only developed on single-node machines. Unfortunately, the computing and storage capacities of a single machine can easily reach their limits as knowledge graphs become larger and more complex, which limits the application scope of TransE. In order to solve this problem, we propose a distributed TransE method, known as DTransE, which can utilize distributed computing resources to calculate knowledge graph embeddings. However, building a distributed TransE is complicated and involves challenges of knowledge graph partitioning and computation. To solve these challenges, we provide a high-quality edge partitioning algorithm for the power-law graph by considering the high-degree and low-degree vertices with adaptive weights, which can balance the workload. By using the unactivated Gather-Apply-Scatter model on TransE, the processes periodically exchange messages in a loop. The irregular data distribution among the processes is also optimized to further accelerate communication. As far as we know, this is the first work on a distributed TransE method. We use link prediction to evaluate the DTransE in a distributed environment. Experiments show that, compared to the original TransE method, our proposed DTransE is, on average, 24.5 times faster with a minimum loss of accuracy; compared to the state-of-the-art parallel TransE implementation, DTransE is two times faster on average.

Original language	English
Article number	9380698
Pages (from-to)	2509-2523
Number of pages	15
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	32
Issue number	10
DOIs	https://doi.org/10.1109/TPDS.2021.3066442
Publication status	Published - 1 Oct 2021

Keywords

TransE
distributed computing
knowledge graph and representation
workload partitioning

Access to Document

10.1109/TPDS.2021.3066442

Cite this

@article{5b08a70c55394efeab8306258aa83569,

title = "DTransE: Distributed Translating Embedding for Knowledge Graph",

abstract = "Knowledge graphs play an important role in many applications, such as link prediction and question answering. Translating embedding for knowledge graphs is done with the aim of encoding structured information on entities and their rich relations in a low-dimensional embedding space. TransE is one of the most important methods in translation-based models, and uses translation invariance to implement translating embedding for knowledge graphs. In this line of work, translating embedding models represent the relation as a translation from the head entity to the tail entity and have achieved impressive results. Currently, the TransE model is only developed on single-node machines. Unfortunately, the computing and storage capacities of a single machine can easily reach their limits as knowledge graphs become larger and more complex, which limits the application scope of TransE. In order to solve this problem, we propose a distributed TransE method, known as DTransE, which can utilize distributed computing resources to calculate knowledge graph embeddings. However, building a distributed TransE is complicated and involves challenges of knowledge graph partitioning and computation. To solve these challenges, we provide a high-quality edge partitioning algorithm for the power-law graph by considering the high-degree and low-degree vertices with adaptive weights, which can balance the workload. By using the unactivated Gather-Apply-Scatter model on TransE, the processes periodically exchange messages in a loop. The irregular data distribution among the processes is also optimized to further accelerate communication. As far as we know, this is the first work on a distributed TransE method. We use link prediction to evaluate the DTransE in a distributed environment. Experiments show that, compared to the original TransE method, our proposed DTransE is, on average, 24.5 times faster with a minimum loss of accuracy; compared to the state-of-the-art parallel TransE implementation, DTransE is two times faster on average.",

keywords = "TransE, distributed computing, knowledge graph and representation, workload partitioning",

author = "Dandan Song and Feng Zhang and Meiyan Lu and Sicheng Yang and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} 1990-2012 IEEE.",

year = "2021",

month = oct,

day = "1",

doi = "10.1109/TPDS.2021.3066442",

language = "English",

volume = "32",

pages = "2509--2523",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "10",

}

TY - JOUR

T1 - DTransE

T2 - Distributed Translating Embedding for Knowledge Graph

AU - Song, Dandan

AU - Zhang, Feng

AU - Lu, Meiyan

AU - Yang, Sicheng

AU - Huang, Heyan

PY - 2021/10/1

Y1 - 2021/10/1

N2 - Knowledge graphs play an important role in many applications, such as link prediction and question answering. Translating embedding for knowledge graphs is done with the aim of encoding structured information on entities and their rich relations in a low-dimensional embedding space. TransE is one of the most important methods in translation-based models, and uses translation invariance to implement translating embedding for knowledge graphs. In this line of work, translating embedding models represent the relation as a translation from the head entity to the tail entity and have achieved impressive results. Currently, the TransE model is only developed on single-node machines. Unfortunately, the computing and storage capacities of a single machine can easily reach their limits as knowledge graphs become larger and more complex, which limits the application scope of TransE. In order to solve this problem, we propose a distributed TransE method, known as DTransE, which can utilize distributed computing resources to calculate knowledge graph embeddings. However, building a distributed TransE is complicated and involves challenges of knowledge graph partitioning and computation. To solve these challenges, we provide a high-quality edge partitioning algorithm for the power-law graph by considering the high-degree and low-degree vertices with adaptive weights, which can balance the workload. By using the unactivated Gather-Apply-Scatter model on TransE, the processes periodically exchange messages in a loop. The irregular data distribution among the processes is also optimized to further accelerate communication. As far as we know, this is the first work on a distributed TransE method. We use link prediction to evaluate the DTransE in a distributed environment. Experiments show that, compared to the original TransE method, our proposed DTransE is, on average, 24.5 times faster with a minimum loss of accuracy; compared to the state-of-the-art parallel TransE implementation, DTransE is two times faster on average.

AB - Knowledge graphs play an important role in many applications, such as link prediction and question answering. Translating embedding for knowledge graphs is done with the aim of encoding structured information on entities and their rich relations in a low-dimensional embedding space. TransE is one of the most important methods in translation-based models, and uses translation invariance to implement translating embedding for knowledge graphs. In this line of work, translating embedding models represent the relation as a translation from the head entity to the tail entity and have achieved impressive results. Currently, the TransE model is only developed on single-node machines. Unfortunately, the computing and storage capacities of a single machine can easily reach their limits as knowledge graphs become larger and more complex, which limits the application scope of TransE. In order to solve this problem, we propose a distributed TransE method, known as DTransE, which can utilize distributed computing resources to calculate knowledge graph embeddings. However, building a distributed TransE is complicated and involves challenges of knowledge graph partitioning and computation. To solve these challenges, we provide a high-quality edge partitioning algorithm for the power-law graph by considering the high-degree and low-degree vertices with adaptive weights, which can balance the workload. By using the unactivated Gather-Apply-Scatter model on TransE, the processes periodically exchange messages in a loop. The irregular data distribution among the processes is also optimized to further accelerate communication. As far as we know, this is the first work on a distributed TransE method. We use link prediction to evaluate the DTransE in a distributed environment. Experiments show that, compared to the original TransE method, our proposed DTransE is, on average, 24.5 times faster with a minimum loss of accuracy; compared to the state-of-the-art parallel TransE implementation, DTransE is two times faster on average.

KW - TransE

KW - distributed computing

KW - knowledge graph and representation

KW - workload partitioning

UR - http://www.scopus.com/inward/record.url?scp=85103211366&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2021.3066442

DO - 10.1109/TPDS.2021.3066442

M3 - Article

AN - SCOPUS:85103211366

SN - 1045-9219

VL - 32

SP - 2509

EP - 2523

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 10

M1 - 9380698

ER -

DTransE: Distributed Translating Embedding for Knowledge Graph

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this