TY - GEN
T1 - Maximal Biclique Enumeration
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
AU - Chen, Jiujian
AU - Wang, Kai
AU - Li, Rong Hua
AU - Qin, Hongchao
AU - Lin, Xuemin
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Bipartite graphs are commonly used to model relationships between two distinct types of entities, such as customer-product relationships in e-commerce platforms and protein-protein interactions in bioinformatics. Enumerating all maximal bicliques from a bipartite graph is a fundamental graph mining problem that has been widely used in many real-world applications including community search and spam detection. Existing algorithms for maximal biclique enumeration can struggle to scale to large graphs with a vast number of maximal bicliques. In this paper, we propose a novel and highly-efficient algorithm for maximal biclique enumeration in bipartite graphs using prefix trees. Specifically, a prefix tree is a data structure that stores lists of elements as paths in the tree, and we observe that a maximal biclique can be represented uniquely by the vertices in one of its vertex layers and stored compactly in prefix trees. The process of our algorithm is divided into two steps. First, we find the lower layer vertices of all maximal bicliques and organize them in a prefix tree (i.e., the result tree). During this step, we transform the original time-consuming operations of checking maximality and filtering candidates for vertex sets into determining uniqueness and performing extraction from a prefix tree at each level of the recursion. Second, we use the result tree to obtain the upper layer vertices of the maximal bicliques by computing the common neighbors of vertices in the tree. In this step, we further optimize the computation for intersections of vertex sets by compressing the neighbors of each vertex and memoization. In addition, we also propose a pre-processing method based on the order of traversal on the prefix tree to reduce memory usage. We conduct extensive experiments on 10 real-world datasets, and the results demonstrate that the proposed algorithm outperforms existing solutions by up to one order of magnitude.
AB - Bipartite graphs are commonly used to model relationships between two distinct types of entities, such as customer-product relationships in e-commerce platforms and protein-protein interactions in bioinformatics. Enumerating all maximal bicliques from a bipartite graph is a fundamental graph mining problem that has been widely used in many real-world applications including community search and spam detection. Existing algorithms for maximal biclique enumeration can struggle to scale to large graphs with a vast number of maximal bicliques. In this paper, we propose a novel and highly-efficient algorithm for maximal biclique enumeration in bipartite graphs using prefix trees. Specifically, a prefix tree is a data structure that stores lists of elements as paths in the tree, and we observe that a maximal biclique can be represented uniquely by the vertices in one of its vertex layers and stored compactly in prefix trees. The process of our algorithm is divided into two steps. First, we find the lower layer vertices of all maximal bicliques and organize them in a prefix tree (i.e., the result tree). During this step, we transform the original time-consuming operations of checking maximality and filtering candidates for vertex sets into determining uniqueness and performing extraction from a prefix tree at each level of the recursion. Second, we use the result tree to obtain the upper layer vertices of the maximal bicliques by computing the common neighbors of vertices in the tree. In this step, we further optimize the computation for intersections of vertex sets by compressing the neighbors of each vertex and memoization. In addition, we also propose a pre-processing method based on the order of traversal on the prefix tree to reduce memory usage. We conduct extensive experiments on 10 real-world datasets, and the results demonstrate that the proposed algorithm outperforms existing solutions by up to one order of magnitude.
KW - bipartite graph
KW - graph mining
KW - maximal biclique
KW - prefix tree
UR - http://www.scopus.com/inward/record.url?scp=85200471284&partnerID=8YFLogxK
U2 - 10.1109/ICDE60146.2024.00200
DO - 10.1109/ICDE60146.2024.00200
M3 - Conference contribution
AN - SCOPUS:85200471284
T3 - Proceedings - International Conference on Data Engineering
SP - 2544
EP - 2556
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
Y2 - 13 May 2024 through 17 May 2024
ER -