TY - GEN
T1 - PSMC
T2 - 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
AU - Lin, Longlong
AU - Jia, Tao
AU - Wang, Zeli
AU - Zhao, Jin
AU - Li, Rong Hua
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/8/24
Y1 - 2024/8/24
N2 - Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs (i.e., motifs), instead of the lower-order edges, as the atomic clustering unit, which has been recognized as the state-of-the-art solution in ground truth community detection and knowledge discovery. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results on real-life and synthetic demonstrate that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the state-of-the art baselines.
AB - Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs (i.e., motifs), instead of the lower-order edges, as the atomic clustering unit, which has been recognized as the state-of-the-art solution in ground truth community detection and knowledge discovery. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results on real-life and synthetic demonstrate that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the state-of-the art baselines.
KW - higher-order graph clustering
KW - motif conductance
UR - http://www.scopus.com/inward/record.url?scp=85203714130&partnerID=8YFLogxK
U2 - 10.1145/3637528.3671666
DO - 10.1145/3637528.3671666
M3 - Conference contribution
AN - SCOPUS:85203714130
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1793
EP - 1803
BT - KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 25 August 2024 through 29 August 2024
ER -