TY - JOUR
T1 - Cost-Aware Triangle Counting Over Geo-Distributed Datacenters
AU - Ma, Delong
AU - Yuan, Ye
AU - Zhang, Yanfeng
AU - Cao, Chunze
AU - Ma, Yuliang
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2025
Y1 - 2025
N2 - Counting triangles is an important topic in many practical applications, such as anomaly detection, community search, and recommendation systems. For triangle counting in large and dynamic graphs, recent work has focused on distributed streaming algorithms. These works assume that the graph is processed in the same location, while in reality, the graph stream may be generated and processed in datacenters that are geographically distributed. This raises new challenges to existing triangle counting algorithms, due to the multi-level heterogeneities in network bandwidth and communication prices in geo-distributed datacenters. In this article, we propose a cost-aware framework named GeoTri based on the Master-Worker-Aggregator architecture, which takes both the cost and performance objectives into consideration for triangle counting in geo-distributed datacenters. The two core parts of this framework are the cost-aware nodes assignment strategy in master, which is critical to obtain node’s position and distribute edges reasonably to reduce the cost (i.e., time cost and monetary cost), and cost-aware neighbor transfer strategy among workers, which further eliminates redundancy in data transfers.
AB - Counting triangles is an important topic in many practical applications, such as anomaly detection, community search, and recommendation systems. For triangle counting in large and dynamic graphs, recent work has focused on distributed streaming algorithms. These works assume that the graph is processed in the same location, while in reality, the graph stream may be generated and processed in datacenters that are geographically distributed. This raises new challenges to existing triangle counting algorithms, due to the multi-level heterogeneities in network bandwidth and communication prices in geo-distributed datacenters. In this article, we propose a cost-aware framework named GeoTri based on the Master-Worker-Aggregator architecture, which takes both the cost and performance objectives into consideration for triangle counting in geo-distributed datacenters. The two core parts of this framework are the cost-aware nodes assignment strategy in master, which is critical to obtain node’s position and distribute edges reasonably to reduce the cost (i.e., time cost and monetary cost), and cost-aware neighbor transfer strategy among workers, which further eliminates redundancy in data transfers.
KW - Geo-distributed datacenters
KW - graph stream
KW - triangle counting
KW - wide area network
UR - https://www.scopus.com/pages/publications/85213445323
U2 - 10.1109/TBDATA.2024.3522816
DO - 10.1109/TBDATA.2024.3522816
M3 - Article
AN - SCOPUS:85213445323
SN - 2332-7790
VL - 11
SP - 2008
EP - 2024
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 4
ER -