TY - JOUR
T1 - PrivMiner
T2 - a similar-first approach to frequent itemset mining under local differential privacy
AU - Li, Yanhui
AU - Huang, Chen
AU - Cheng, Mengyuan
AU - Lv, Tianci
AU - Zhao, Yuxin
AU - Sun, Yongjiao
AU - Yuan, Ye
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/3
Y1 - 2025/3
N2 - The discovery of frequent itemset can serve valuable economic and research purposes. Mining such data, however, presents privacy challenges. Local differential privacy (LDP) has been established as a strong and rigorous privacy scheme for collecting sensitive information from users. Due to the inherent high-dimensionality and heterogeneity of set-valued data, it is quite difficult to mine frequent itemsets under LDP. Existing solutions for this issue are either limited to single items or rely on sampling-based frequency oracle protocols, leading to poor overall result quality. Motivated by this, we present PrivMiner, a two-phase LDP protocol which directly collects user record information comprehensively. While the methodology of directly collecting an entire user record is an obvious choice, obtaining an accurate record under LDP is highly challenging. PrivMiner accomplishes this with a novel technique called the Group-wise Estimation Mechanism by exploiting the similarity between records. PrivMiner needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for millions of items. Extensive experiments conducted on real-world datasets demonstrate the effectiveness of our proposed method and its advantages over existing solutions. In particular, PrivMiner improves the error of estimation count by at least one order of magnitude.
AB - The discovery of frequent itemset can serve valuable economic and research purposes. Mining such data, however, presents privacy challenges. Local differential privacy (LDP) has been established as a strong and rigorous privacy scheme for collecting sensitive information from users. Due to the inherent high-dimensionality and heterogeneity of set-valued data, it is quite difficult to mine frequent itemsets under LDP. Existing solutions for this issue are either limited to single items or rely on sampling-based frequency oracle protocols, leading to poor overall result quality. Motivated by this, we present PrivMiner, a two-phase LDP protocol which directly collects user record information comprehensively. While the methodology of directly collecting an entire user record is an obvious choice, obtaining an accurate record under LDP is highly challenging. PrivMiner accomplishes this with a novel technique called the Group-wise Estimation Mechanism by exploiting the similarity between records. PrivMiner needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for millions of items. Extensive experiments conducted on real-world datasets demonstrate the effectiveness of our proposed method and its advantages over existing solutions. In particular, PrivMiner improves the error of estimation count by at least one order of magnitude.
KW - Frequent itemset mining
KW - Group-wise estimation mechanism
KW - Local differential privacy
KW - Privacy protection
UR - http://www.scopus.com/inward/record.url?scp=105000132482&partnerID=8YFLogxK
U2 - 10.1007/s11280-025-01339-x
DO - 10.1007/s11280-025-01339-x
M3 - Article
AN - SCOPUS:105000132482
SN - 1386-145X
VL - 28
JO - World Wide Web
JF - World Wide Web
IS - 2
M1 - 25
ER -