Self-Training GNN-based Community Search in Large Attributed Heterogeneous Information Networks

Yuan Li, Xiuxu Chen, Yuhai Zhao*, Wen Shan, Zhengkui Wang, Guoli Yang, Guoren Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Attributed Heterogeneous Information Networks (AHINs) amalgamate the advantages of attributed graphs (AGs) and heterogeneous information networks (HINs) to model intri-cate systems. Within this context, community search-aiming to identify the most probable community containing the queried ver-tex-has been extensively explored in AGs and HINs. However, existing methodologies fall short in simultaneously accommodating heterogeneous attributes and multiple meta-paths in AHINs, posing a substantial challenge in investigating community search within expansive AHINs. Recent studies highlight the efficacy of machine learning-based community search, offering enhanced flexibility and higher-quality communities in comparison to traditional structural-based methods. Yet, semi-supervised learning methods demand substantial labeled data and incur considerable memory and time costs when applied to large AHINs. To tackle these challenges, we propose a MK (Most-likely; K-sized) community search approach. This approach involves defining an MK community and leveraging Graph Neural Networks (GNNs) to amalgamate structures and attributes into a unified goodness metric. Our methodology involves training on local subgraphs sampled via guided random walks based on multiple meta-paths, circumventing the need for training on the entire graph. Moreover, attention-based GNNs adeptly learn meta-path weights to guide weighted walks in subsequent iterations. Additionally, self-training is employed to alleviate the labeling burden. We also demonstrate that pinpointing the location for the MK community is NP-hard and present a heuristic local search strategy that expedites the resolution process through rewriting. Ultimately, the convergence of iterations yields the solution. Extensive experiments conducted on four real-world datasets underscore that the MK framework significantly enhances both effectiveness and efficiency in community search within AHINs. Our code is publicly available at https://github.com/uucxuu/CSAH.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PublisherIEEE Computer Society
Pages2765-2778
Number of pages14
ISBN (Electronic)9798350317152
DOIs
Publication statusPublished - 2024
Event40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, Netherlands
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference40th IEEE International Conference on Data Engineering, ICDE 2024
Country/TerritoryNetherlands
CityUtrecht
Period13/05/2417/05/24

Keywords

  • Attributed Heterogeneous Information Networks
  • Community Search
  • Graph Neural Network

Fingerprint

Dive into the research topics of 'Self-Training GNN-based Community Search in Large Attributed Heterogeneous Information Networks'. Together they form a unique fingerprint.

Cite this