Nearly Optimal Rates of Privacy-Preserving Sparse Generalized Eigenvalue Problem

Lijie Hu; Zihang Xiang; Jiabin Liu; Di Wang

doi:10.1109/TKDE.2023.3330775

Nearly Optimal Rates of Privacy-Preserving Sparse Generalized Eigenvalue Problem

Lijie Hu, Zihang Xiang, Jiabin Liu, Di Wang^*

^*此作品的通讯作者

信息与电子学院

King Abdullah University of Science and Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

In this article, we study the (sparse) generalized eigenvalue problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA), and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has anℓ₂-norm estimation error of Õ(dn+dⁿ²ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such an initial parameter privately. In the high dimensional sparse case where nd≫n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of (slogdn+slogdⁿ²ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(logn) by providing the lower bounds of PCA and SIR under the statistical setting and in the CDP model. Finally, to give a separation between ϵ-DP and ρ-CDP for GEP, we also provide the lower bound Ω(dn+^d2ⁿ²^ϵ2) and Ω(slogdn+^s2^log2dⁿ²^ϵ2) of private minimax risk for PCA, under the statistical setting and ϵ-DP model, in low and high dimensional sparse case respectively. Finally, extensive experiments on both synthetic and real-world data support our previous theoretical analysis.

源语言	英语
页（从-至）	4101-4115
页数	15
期刊	IEEE Transactions on Knowledge and Data Engineering
卷	36
期	8
DOI	https://doi.org/10.1109/TKDE.2023.3330775
出版状态	已出版 - 2024

访问文件

10.1109/TKDE.2023.3330775

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{fbc3175124dd4c41a3cd598360811bb3,

title = "Nearly Optimal Rates of Privacy-Preserving Sparse Generalized Eigenvalue Problem",

abstract = "In this article, we study the (sparse) generalized eigenvalue problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA), and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has anℓ2-norm estimation error of {\~O}(dn+dn2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such an initial parameter privately. In the high dimensional sparse case where nd≫n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of (slogdn+slogdn2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(logn) by providing the lower bounds of PCA and SIR under the statistical setting and in the CDP model. Finally, to give a separation between ϵ-DP and ρ-CDP for GEP, we also provide the lower bound Ω(dn+d2n2ϵ2) and Ω(slogdn+s2log2dn2ϵ2) of private minimax risk for PCA, under the statistical setting and ϵ-DP model, in low and high dimensional sparse case respectively. Finally, extensive experiments on both synthetic and real-world data support our previous theoretical analysis.",

keywords = "Dimension reduction, generalized eigenvalue problem, privacy, sliced inverse regression",

author = "Lijie Hu and Zihang Xiang and Jiabin Liu and Di Wang",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2024",

doi = "10.1109/TKDE.2023.3330775",

language = "English",

volume = "36",

pages = "4101--4115",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "8",

}

TY - JOUR

T1 - Nearly Optimal Rates of Privacy-Preserving Sparse Generalized Eigenvalue Problem

AU - Hu, Lijie

AU - Xiang, Zihang

AU - Liu, Jiabin

AU - Wang, Di

PY - 2024

Y1 - 2024

N2 - In this article, we study the (sparse) generalized eigenvalue problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA), and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has anℓ2-norm estimation error of Õ(dn+dn2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such an initial parameter privately. In the high dimensional sparse case where nd≫n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of (slogdn+slogdn2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(logn) by providing the lower bounds of PCA and SIR under the statistical setting and in the CDP model. Finally, to give a separation between ϵ-DP and ρ-CDP for GEP, we also provide the lower bound Ω(dn+d2n2ϵ2) and Ω(slogdn+s2log2dn2ϵ2) of private minimax risk for PCA, under the statistical setting and ϵ-DP model, in low and high dimensional sparse case respectively. Finally, extensive experiments on both synthetic and real-world data support our previous theoretical analysis.

AB - In this article, we study the (sparse) generalized eigenvalue problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA), and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has anℓ2-norm estimation error of Õ(dn+dn2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such an initial parameter privately. In the high dimensional sparse case where nd≫n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of (slogdn+slogdn2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(logn) by providing the lower bounds of PCA and SIR under the statistical setting and in the CDP model. Finally, to give a separation between ϵ-DP and ρ-CDP for GEP, we also provide the lower bound Ω(dn+d2n2ϵ2) and Ω(slogdn+s2log2dn2ϵ2) of private minimax risk for PCA, under the statistical setting and ϵ-DP model, in low and high dimensional sparse case respectively. Finally, extensive experiments on both synthetic and real-world data support our previous theoretical analysis.

KW - Dimension reduction

KW - generalized eigenvalue problem

KW - privacy

KW - sliced inverse regression

UR - http://www.scopus.com/inward/record.url?scp=85177043333&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2023.3330775

DO - 10.1109/TKDE.2023.3330775

M3 - Article

AN - SCOPUS:85177043333

SN - 1041-4347

VL - 36

SP - 4101

EP - 4115

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 8

ER -

Nearly Optimal Rates of Privacy-Preserving Sparse Generalized Eigenvalue Problem

摘要

访问文件

其它文件与链接

指纹

引用此