TY - GEN
T1 - Lightweight Local Differential Privacy For High-dimensional Data
AU - Zheng, Tianyu
AU - Zhao, Xiaolin
AU - Liu, Zhenyan
AU - Song, Ce
AU - Li, Yiyu
AU - Huang, Yukun
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Frequency publication, as a data release mechanism, typically involves data counting and aggregation. When integrated with differential privacy, this approach introduces carefully calibrated randomness during data transmission and publication processes to mitigate personal privacy leakage risks. However, challenges such as excessive user response ranges or flawed encoding schemes may induce dimensional expansion of local desensitization data, leading to high-dimensional issues including model fitting difficulties, communication overhead explosion, and computational complexity escalation. This paper proposes two innovative solutions. First, the Succinct Histograms Based on Encoding Optimization (OSH) algorithm employing orthogonal matrix encoding effectively addresses the prevalent accuracy degradation problem in conventional sampling-based methods. Second, the Local, Private, Efficient Protocols Succinct Histograms Based on non-cryptographic Hash Algorithm (NCHOSH) utilizes non-cryptographic hashing for encoding, which enhances encoding efficiency while resolving collision issues inherent in prior approaches, and enables data desensitization in unknown candidate value scenarios. Both methodologies achieve lightweight implementation through mapping-based dimension reduction, significantly reducing communication costs and computational burdens associated with high-dimensional data processing. Experimental comparisons with mainstream algorithms demonstrate superior performance of OSH and NCHOSH in multiple metrics.
AB - Frequency publication, as a data release mechanism, typically involves data counting and aggregation. When integrated with differential privacy, this approach introduces carefully calibrated randomness during data transmission and publication processes to mitigate personal privacy leakage risks. However, challenges such as excessive user response ranges or flawed encoding schemes may induce dimensional expansion of local desensitization data, leading to high-dimensional issues including model fitting difficulties, communication overhead explosion, and computational complexity escalation. This paper proposes two innovative solutions. First, the Succinct Histograms Based on Encoding Optimization (OSH) algorithm employing orthogonal matrix encoding effectively addresses the prevalent accuracy degradation problem in conventional sampling-based methods. Second, the Local, Private, Efficient Protocols Succinct Histograms Based on non-cryptographic Hash Algorithm (NCHOSH) utilizes non-cryptographic hashing for encoding, which enhances encoding efficiency while resolving collision issues inherent in prior approaches, and enables data desensitization in unknown candidate value scenarios. Both methodologies achieve lightweight implementation through mapping-based dimension reduction, significantly reducing communication costs and computational burdens associated with high-dimensional data processing. Experimental comparisons with mainstream algorithms demonstrate superior performance of OSH and NCHOSH in multiple metrics.
KW - data desensitization
KW - frequency publication
KW - local differential privacy
UR - https://www.scopus.com/pages/publications/105019322255
U2 - 10.1109/CISAT66811.2025.11181983
DO - 10.1109/CISAT66811.2025.11181983
M3 - Conference contribution
AN - SCOPUS:105019322255
T3 - 2025 8th International Conference on Computer Information Science and Application Technology, CISAT 2025
SP - 175
EP - 182
BT - 2025 8th International Conference on Computer Information Science and Application Technology, CISAT 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Computer Information Science and Application Technology, CISAT 2025
Y2 - 11 July 2025 through 13 July 2025
ER -