GPPR: 跨域分布式个性化 PageRank 算法

Zi Jun Chen, De Long Ma, Yi Shu Wang, Ye Yuan*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Personalized PageRank, as a basic algorithm in large graph analysis, has a wide range of applications in search engines, social recommendation, community detection, and other fields, and has been a hot problem of interest to researchers. The existing distributed personalized PageRank algorithms assume that all data are located in the same geographic location and the network environment is the same among the computing nodes where the data are located. However, in the real world, these data may be distributed in multiple data centers across continents, and these cross-geo-distributed data centers are connected to each other through WANs, which are characterized by heterogeneous network bandwidth, huge hardware differences, and high communication costs. The distributed personalized PageRank algorithm requires multiple iterations and random wandering on the global graph. Therefore, the existing distributed personalized PageRank algorithms are not applicable to the cross-geo-distributed environment. To address this problem, the GPPR (cross-geo-distributed personalized PageRank) algorithm is proposed in this study. The algorithm first preprocesses the big graph data in the cross-geo-distributed environment and maps the graph data by using a heuristic algorithm to reduce the impact of network bandwidth heterogeneity on the iteration speed of the algorithm. Secondly, GPPR improves the random wandering approach and proposes a probability-based push algorithm to further reduce the number of iterations required by the algorithm by reducing the bandwidth load of transmitting data between working nodes. The GPPR algorithm is implemented based on the Spark framework and a real cross-geo-distributed environment in AliCloud is built to conduct experiments on eight open-source big graph data compared with several existing representative distributed personalized PageRank algorithms. The results show that the communication data volume of GPPR is reduced by 30% on average in the cross-geo-distributed environment compared with other algorithms. In terms of algorithm running efficiency, GPPR improves by an average of 2.5 times compared to other algorithms.

投稿的翻译标题GPPR: Cross-geo-distributed Personalized PageRank Algorithm
源语言繁体中文
页(从-至)1090-1106
页数17
期刊Ruan Jian Xue Bao/Journal of Software
35
3
DOI
出版状态已出版 - 3月 2024

关键词

  • approximate calculation
  • cross-geo-distributed
  • personalized PageRank

指纹

探究 'GPPR: 跨域分布式个性化 PageRank 算法' 的科研主题。它们共同构成独一无二的指纹。

引用此

Chen, Z. J., Ma, D. L., Wang, Y. S., & Yuan, Y. (2024). GPPR: 跨域分布式个性化 PageRank 算法. Ruan Jian Xue Bao/Journal of Software, 35(3), 1090-1106. https://doi.org/10.13328/j.cnki.jos.007072