Maximal Biclique Enumeration: A Prefix Tree Based Approach

Jiujian Chen; Kai Wang; Rong Hua Li; Hongchao Qin; Xuemin Lin; Guoren Wang

doi:10.1109/ICDE60146.2024.00200

Maximal Biclique Enumeration: A Prefix Tree Based Approach

Jiujian Chen, Kai Wang, Rong Hua Li^*, Hongchao Qin, Xuemin Lin, Guoren Wang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

Bipartite graphs are commonly used to model relationships between two distinct types of entities, such as customer-product relationships in e-commerce platforms and protein-protein interactions in bioinformatics. Enumerating all maximal bicliques from a bipartite graph is a fundamental graph mining problem that has been widely used in many real-world applications including community search and spam detection. Existing algorithms for maximal biclique enumeration can struggle to scale to large graphs with a vast number of maximal bicliques. In this paper, we propose a novel and highly-efficient algorithm for maximal biclique enumeration in bipartite graphs using prefix trees. Specifically, a prefix tree is a data structure that stores lists of elements as paths in the tree, and we observe that a maximal biclique can be represented uniquely by the vertices in one of its vertex layers and stored compactly in prefix trees. The process of our algorithm is divided into two steps. First, we find the lower layer vertices of all maximal bicliques and organize them in a prefix tree (i.e., the result tree). During this step, we transform the original time-consuming operations of checking maximality and filtering candidates for vertex sets into determining uniqueness and performing extraction from a prefix tree at each level of the recursion. Second, we use the result tree to obtain the upper layer vertices of the maximal bicliques by computing the common neighbors of vertices in the tree. In this step, we further optimize the computation for intersections of vertex sets by compressing the neighbors of each vertex and memoization. In addition, we also propose a pre-processing method based on the order of traversal on the prefix tree to reduce memory usage. We conduct extensive experiments on 10 real-world datasets, and the results demonstrate that the proposed algorithm outperforms existing solutions by up to one order of magnitude.

源语言	英语
主期刊名	Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
出版商	IEEE Computer Society
页	2544-2556
页数	13
ISBN（电子版）	9798350317152
DOI	https://doi.org/10.1109/ICDE60146.2024.00200
出版状态	已出版 - 2024
活动	40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, 荷兰期限: 13 5月 2024 → 17 5月 2024

出版系列

姓名	Proceedings - International Conference on Data Engineering
ISSN（印刷版）	1084-4627
ISSN（电子版）	2375-0286

会议

会议	40th IEEE International Conference on Data Engineering, ICDE 2024
国家/地区	荷兰
市	Utrecht
时期	13/05/24 → 17/05/24

访问文件

10.1109/ICDE60146.2024.00200

其它文件与链接

链接到 Scopus 的出版物

引用此

Chen, J., Wang, K., Li, R. H., Qin, H., Lin, X., & Wang, G. (2024). Maximal Biclique Enumeration: A Prefix Tree Based Approach. 在 Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024 (页码 2544-2556). (Proceedings - International Conference on Data Engineering). IEEE Computer Society. https://doi.org/10.1109/ICDE60146.2024.00200

@inproceedings{6bbefd777b1e4db1918b1003a5ffe970,

title = "Maximal Biclique Enumeration: A Prefix Tree Based Approach",

abstract = "Bipartite graphs are commonly used to model relationships between two distinct types of entities, such as customer-product relationships in e-commerce platforms and protein-protein interactions in bioinformatics. Enumerating all maximal bicliques from a bipartite graph is a fundamental graph mining problem that has been widely used in many real-world applications including community search and spam detection. Existing algorithms for maximal biclique enumeration can struggle to scale to large graphs with a vast number of maximal bicliques. In this paper, we propose a novel and highly-efficient algorithm for maximal biclique enumeration in bipartite graphs using prefix trees. Specifically, a prefix tree is a data structure that stores lists of elements as paths in the tree, and we observe that a maximal biclique can be represented uniquely by the vertices in one of its vertex layers and stored compactly in prefix trees. The process of our algorithm is divided into two steps. First, we find the lower layer vertices of all maximal bicliques and organize them in a prefix tree (i.e., the result tree). During this step, we transform the original time-consuming operations of checking maximality and filtering candidates for vertex sets into determining uniqueness and performing extraction from a prefix tree at each level of the recursion. Second, we use the result tree to obtain the upper layer vertices of the maximal bicliques by computing the common neighbors of vertices in the tree. In this step, we further optimize the computation for intersections of vertex sets by compressing the neighbors of each vertex and memoization. In addition, we also propose a pre-processing method based on the order of traversal on the prefix tree to reduce memory usage. We conduct extensive experiments on 10 real-world datasets, and the results demonstrate that the proposed algorithm outperforms existing solutions by up to one order of magnitude.",

keywords = "bipartite graph, graph mining, maximal biclique, prefix tree",

author = "Jiujian Chen and Kai Wang and Li, {Rong Hua} and Hongchao Qin and Xuemin Lin and Guoren Wang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 40th IEEE International Conference on Data Engineering, ICDE 2024 ; Conference date: 13-05-2024 Through 17-05-2024",

year = "2024",

doi = "10.1109/ICDE60146.2024.00200",

language = "English",

series = "Proceedings - International Conference on Data Engineering",

publisher = "IEEE Computer Society",

pages = "2544--2556",

booktitle = "Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024",

address = "United States",

}

Chen, J, Wang, K, Li, RH, Qin, H, Lin, X & Wang, G 2024, Maximal Biclique Enumeration: A Prefix Tree Based Approach. 在 Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024. Proceedings - International Conference on Data Engineering, IEEE Computer Society, 页码 2544-2556, 40th IEEE International Conference on Data Engineering, ICDE 2024, Utrecht, 荷兰, 13/05/24. https://doi.org/10.1109/ICDE60146.2024.00200

Maximal Biclique Enumeration: A Prefix Tree Based Approach. / Chen, Jiujian; Wang, Kai; Li, Rong Hua 等.
Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024. IEEE Computer Society, 2024. 页码 2544-2556 (Proceedings - International Conference on Data Engineering).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Maximal Biclique Enumeration

T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024

AU - Chen, Jiujian

AU - Wang, Kai

AU - Li, Rong Hua

AU - Qin, Hongchao

AU - Lin, Xuemin

AU - Wang, Guoren

PY - 2024

Y1 - 2024

N2 - Bipartite graphs are commonly used to model relationships between two distinct types of entities, such as customer-product relationships in e-commerce platforms and protein-protein interactions in bioinformatics. Enumerating all maximal bicliques from a bipartite graph is a fundamental graph mining problem that has been widely used in many real-world applications including community search and spam detection. Existing algorithms for maximal biclique enumeration can struggle to scale to large graphs with a vast number of maximal bicliques. In this paper, we propose a novel and highly-efficient algorithm for maximal biclique enumeration in bipartite graphs using prefix trees. Specifically, a prefix tree is a data structure that stores lists of elements as paths in the tree, and we observe that a maximal biclique can be represented uniquely by the vertices in one of its vertex layers and stored compactly in prefix trees. The process of our algorithm is divided into two steps. First, we find the lower layer vertices of all maximal bicliques and organize them in a prefix tree (i.e., the result tree). During this step, we transform the original time-consuming operations of checking maximality and filtering candidates for vertex sets into determining uniqueness and performing extraction from a prefix tree at each level of the recursion. Second, we use the result tree to obtain the upper layer vertices of the maximal bicliques by computing the common neighbors of vertices in the tree. In this step, we further optimize the computation for intersections of vertex sets by compressing the neighbors of each vertex and memoization. In addition, we also propose a pre-processing method based on the order of traversal on the prefix tree to reduce memory usage. We conduct extensive experiments on 10 real-world datasets, and the results demonstrate that the proposed algorithm outperforms existing solutions by up to one order of magnitude.

AB - Bipartite graphs are commonly used to model relationships between two distinct types of entities, such as customer-product relationships in e-commerce platforms and protein-protein interactions in bioinformatics. Enumerating all maximal bicliques from a bipartite graph is a fundamental graph mining problem that has been widely used in many real-world applications including community search and spam detection. Existing algorithms for maximal biclique enumeration can struggle to scale to large graphs with a vast number of maximal bicliques. In this paper, we propose a novel and highly-efficient algorithm for maximal biclique enumeration in bipartite graphs using prefix trees. Specifically, a prefix tree is a data structure that stores lists of elements as paths in the tree, and we observe that a maximal biclique can be represented uniquely by the vertices in one of its vertex layers and stored compactly in prefix trees. The process of our algorithm is divided into two steps. First, we find the lower layer vertices of all maximal bicliques and organize them in a prefix tree (i.e., the result tree). During this step, we transform the original time-consuming operations of checking maximality and filtering candidates for vertex sets into determining uniqueness and performing extraction from a prefix tree at each level of the recursion. Second, we use the result tree to obtain the upper layer vertices of the maximal bicliques by computing the common neighbors of vertices in the tree. In this step, we further optimize the computation for intersections of vertex sets by compressing the neighbors of each vertex and memoization. In addition, we also propose a pre-processing method based on the order of traversal on the prefix tree to reduce memory usage. We conduct extensive experiments on 10 real-world datasets, and the results demonstrate that the proposed algorithm outperforms existing solutions by up to one order of magnitude.

KW - bipartite graph

KW - graph mining

KW - maximal biclique

KW - prefix tree

UR - http://www.scopus.com/inward/record.url?scp=85200471284&partnerID=8YFLogxK

U2 - 10.1109/ICDE60146.2024.00200

DO - 10.1109/ICDE60146.2024.00200

M3 - Conference contribution

AN - SCOPUS:85200471284

T3 - Proceedings - International Conference on Data Engineering

SP - 2544

EP - 2556

BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024

PB - IEEE Computer Society

Y2 - 13 May 2024 through 17 May 2024

ER -

Maximal Biclique Enumeration: A Prefix Tree Based Approach

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此