TY - GEN
T1 - Self-Training GNN-based Community Search in Large Attributed Heterogeneous Information Networks
AU - Li, Yuan
AU - Chen, Xiuxu
AU - Zhao, Yuhai
AU - Shan, Wen
AU - Wang, Zhengkui
AU - Yang, Guoli
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Attributed Heterogeneous Information Networks (AHINs) amalgamate the advantages of attributed graphs (AGs) and heterogeneous information networks (HINs) to model intri-cate systems. Within this context, community search-aiming to identify the most probable community containing the queried ver-tex-has been extensively explored in AGs and HINs. However, existing methodologies fall short in simultaneously accommodating heterogeneous attributes and multiple meta-paths in AHINs, posing a substantial challenge in investigating community search within expansive AHINs. Recent studies highlight the efficacy of machine learning-based community search, offering enhanced flexibility and higher-quality communities in comparison to traditional structural-based methods. Yet, semi-supervised learning methods demand substantial labeled data and incur considerable memory and time costs when applied to large AHINs. To tackle these challenges, we propose a MK (Most-likely; K-sized) community search approach. This approach involves defining an MK community and leveraging Graph Neural Networks (GNNs) to amalgamate structures and attributes into a unified goodness metric. Our methodology involves training on local subgraphs sampled via guided random walks based on multiple meta-paths, circumventing the need for training on the entire graph. Moreover, attention-based GNNs adeptly learn meta-path weights to guide weighted walks in subsequent iterations. Additionally, self-training is employed to alleviate the labeling burden. We also demonstrate that pinpointing the location for the MK community is NP-hard and present a heuristic local search strategy that expedites the resolution process through rewriting. Ultimately, the convergence of iterations yields the solution. Extensive experiments conducted on four real-world datasets underscore that the MK framework significantly enhances both effectiveness and efficiency in community search within AHINs. Our code is publicly available at https://github.com/uucxuu/CSAH.
AB - Attributed Heterogeneous Information Networks (AHINs) amalgamate the advantages of attributed graphs (AGs) and heterogeneous information networks (HINs) to model intri-cate systems. Within this context, community search-aiming to identify the most probable community containing the queried ver-tex-has been extensively explored in AGs and HINs. However, existing methodologies fall short in simultaneously accommodating heterogeneous attributes and multiple meta-paths in AHINs, posing a substantial challenge in investigating community search within expansive AHINs. Recent studies highlight the efficacy of machine learning-based community search, offering enhanced flexibility and higher-quality communities in comparison to traditional structural-based methods. Yet, semi-supervised learning methods demand substantial labeled data and incur considerable memory and time costs when applied to large AHINs. To tackle these challenges, we propose a MK (Most-likely; K-sized) community search approach. This approach involves defining an MK community and leveraging Graph Neural Networks (GNNs) to amalgamate structures and attributes into a unified goodness metric. Our methodology involves training on local subgraphs sampled via guided random walks based on multiple meta-paths, circumventing the need for training on the entire graph. Moreover, attention-based GNNs adeptly learn meta-path weights to guide weighted walks in subsequent iterations. Additionally, self-training is employed to alleviate the labeling burden. We also demonstrate that pinpointing the location for the MK community is NP-hard and present a heuristic local search strategy that expedites the resolution process through rewriting. Ultimately, the convergence of iterations yields the solution. Extensive experiments conducted on four real-world datasets underscore that the MK framework significantly enhances both effectiveness and efficiency in community search within AHINs. Our code is publicly available at https://github.com/uucxuu/CSAH.
KW - Attributed Heterogeneous Information Networks
KW - Community Search
KW - Graph Neural Network
UR - http://www.scopus.com/inward/record.url?scp=85200453339&partnerID=8YFLogxK
U2 - 10.1109/ICDE60146.2024.00216
DO - 10.1109/ICDE60146.2024.00216
M3 - Conference contribution
AN - SCOPUS:85200453339
T3 - Proceedings - International Conference on Data Engineering
SP - 2765
EP - 2778
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
Y2 - 13 May 2024 through 17 May 2024
ER -