Privacy-preserving Sparse Generalized Eigenvalue Problem

Lijie Hu; Zihang Xiang; Jiabin Liu; Di Wang

Privacy-preserving Sparse Generalized Eigenvalue Problem

Lijie Hu, Zihang Xiang, Jiabin Liu, Di Wang

信息与电子学院

King Abdullah University of Science and Technology

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ- Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an ℓ₂-norm estimation error of Õ(n/d + d/n²ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where d ≫ n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of Õ(s log d/n + s log d/n²ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(log n) by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between ∊-DP and ρ-CDP for GEP, we also provide the lower bound Ω(d/n + d²/n² ∊²) and Ω(s log d/_n + s² log²d/n²∊²) of private minimax risk for PCA, under the statistical setting and ∊-DP model, in low and high dimensional sparse case respectively.

源语言	英语
页（从-至）	5052-5062
页数	11
期刊	Proceedings of Machine Learning Research
卷	206
出版状态	已出版 - 2023
活动	26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023 - Valencia, 西班牙期限: 25 4月 2023 → 27 4月 2023

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{95f309a2315f4a68a1e9673660bae817,

title = "Privacy-preserving Sparse Generalized Eigenvalue Problem",

abstract = "In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ- Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an ℓ2-norm estimation error of {\~O}(n/d + d/n2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where d ≫ n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of {\~O}(s log d/n + s log d/n2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(log n) by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between ∊-DP and ρ-CDP for GEP, we also provide the lower bound Ω(d/n + d2/n2 ∊2) and Ω(s log d/n + s2 log2d/n2∊2) of private minimax risk for PCA, under the statistical setting and ∊-DP model, in low and high dimensional sparse case respectively.",

author = "Lijie Hu and Zihang Xiang and Jiabin Liu and Di Wang",

note = "Publisher Copyright: Copyright {\textcopyright} 2023 by the author(s); 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023 ; Conference date: 25-04-2023 Through 27-04-2023",

year = "2023",

language = "English",

volume = "206",

pages = "5052--5062",

journal = "Proceedings of Machine Learning Research",

issn = "2640-3498",

publisher = "ML Research Press",

}

TY - JOUR

T1 - Privacy-preserving Sparse Generalized Eigenvalue Problem

AU - Hu, Lijie

AU - Xiang, Zihang

AU - Liu, Jiabin

AU - Wang, Di

PY - 2023

Y1 - 2023

N2 - In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ- Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an ℓ2-norm estimation error of Õ(n/d + d/n2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where d ≫ n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of Õ(s log d/n + s log d/n2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(log n) by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between ∊-DP and ρ-CDP for GEP, we also provide the lower bound Ω(d/n + d2/n2 ∊2) and Ω(s log d/n + s2 log2d/n2∊2) of private minimax risk for PCA, under the statistical setting and ∊-DP model, in low and high dimensional sparse case respectively.

AB - In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ- Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an ℓ2-norm estimation error of Õ(n/d + d/n2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where d ≫ n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of Õ(s log d/n + s log d/n2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(log n) by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between ∊-DP and ρ-CDP for GEP, we also provide the lower bound Ω(d/n + d2/n2 ∊2) and Ω(s log d/n + s2 log2d/n2∊2) of private minimax risk for PCA, under the statistical setting and ∊-DP model, in low and high dimensional sparse case respectively.

UR - http://www.scopus.com/inward/record.url?scp=85165177202&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85165177202

SN - 2640-3498

VL - 206

SP - 5052

EP - 5062

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023

Y2 - 25 April 2023 through 27 April 2023

ER -

Privacy-preserving Sparse Generalized Eigenvalue Problem

摘要

其它文件与链接

指纹

引用此