TY - JOUR
T1 - 面向征信数据安全共享的SVM训练机制
AU - Shen, Meng
AU - Zhang, Jie
AU - Zhu, Lie Huang
AU - Xu, Ke
AU - Zhang, Kai Xiang
AU - Li, Hui Zhong
AU - Tang, Xiang Yun
N1 - Publisher Copyright:
© 2021, Science Press. All right reserved.
PY - 2021/4
Y1 - 2021/4
N2 - In the credit reporting industry, the richness and diversity of credit reporting data is extremely important for the development of credit evaluation. However, credit data owned by credit reporting agencies, especially small credit reporting agencies, has issues like incomplete content, incomplete types, and insufficient instance numbers. Therefore, data sharing among credit reporting agencies is very important. In practical application scenarios, credit data has the characteristics of high value, strong privacy, and easy to be copied without authorization. These characteristics will cause great security challenges when sharing credit data. To solve this problem, this paper proposes a SVM training mechanism for secure sharing of credit data. Meanwhile we design a system prototype based on this mechanism, as showed in Figure 3 in the manuscript. This mechanism is based on the consortium blockchain and the addition homomorphic encryption scheme Paillier. With the decentralization of blockchain technology, this mechanism does not need to rely on any trusted third party during model training. At the same time, through secure collaborative computing between credit reporting agencies, the mechanism can meet the credit evaluation needs of the model trainer without revealing data privacy. Firstly, the shared data is stored on the blockchain and is encrypted to ensure that the data is secure and cannot be tampered. This process is completed through smart contracts, without the need for a third party as a data sharing platform. Secondly, based on the addition homomorphic encryption algorithm Paillier, this paper implements various secure operations in the SVM training process based on the stochastic gradient descent algorithm, and designs a secure SVM training algorithm according to the training process. The algorithm flow is shown in Algorithm 2. Based on this algorithm, the credit reporting agencies participating to the calculation can perform operations on the shared encrypted data, ensuring that the model trainer can train the credit evaluation model without leaking the original data. During the training process, only the data provider and a model trainer participate in the calculation. The calculation based on the encrypted data does not require the assistance of a third party, which avoids the risk of privacy leakage caused by the introduction of a third party. The mechanism proposed in this paper is verified by security analysis. In the threat model, neither the model parameters of the model trainer nor the original data of the data provider will have the problem of privacy leakage. At the same time, this paper verifies the usability and performance of the proposed mechanism through experiments on real-world datasets. The experimental results show that compared with the model trained on the plaintext data set under normal conditions, the model trained by the proposed mechanism has no loss of accuracy and the training time is acceptable. In order to further evaluate the advantages of the scheme in this paper, a comparative experiment with other similar privacy training schemes is carried out. The experimental results show that the computation time of this mechanism on the experimental dataset is less than 5% of the comparison mechanism. At the same time, relying on the characteristics of decentralized training, the scheme in this paper has prospects in practical application scenarios.
AB - In the credit reporting industry, the richness and diversity of credit reporting data is extremely important for the development of credit evaluation. However, credit data owned by credit reporting agencies, especially small credit reporting agencies, has issues like incomplete content, incomplete types, and insufficient instance numbers. Therefore, data sharing among credit reporting agencies is very important. In practical application scenarios, credit data has the characteristics of high value, strong privacy, and easy to be copied without authorization. These characteristics will cause great security challenges when sharing credit data. To solve this problem, this paper proposes a SVM training mechanism for secure sharing of credit data. Meanwhile we design a system prototype based on this mechanism, as showed in Figure 3 in the manuscript. This mechanism is based on the consortium blockchain and the addition homomorphic encryption scheme Paillier. With the decentralization of blockchain technology, this mechanism does not need to rely on any trusted third party during model training. At the same time, through secure collaborative computing between credit reporting agencies, the mechanism can meet the credit evaluation needs of the model trainer without revealing data privacy. Firstly, the shared data is stored on the blockchain and is encrypted to ensure that the data is secure and cannot be tampered. This process is completed through smart contracts, without the need for a third party as a data sharing platform. Secondly, based on the addition homomorphic encryption algorithm Paillier, this paper implements various secure operations in the SVM training process based on the stochastic gradient descent algorithm, and designs a secure SVM training algorithm according to the training process. The algorithm flow is shown in Algorithm 2. Based on this algorithm, the credit reporting agencies participating to the calculation can perform operations on the shared encrypted data, ensuring that the model trainer can train the credit evaluation model without leaking the original data. During the training process, only the data provider and a model trainer participate in the calculation. The calculation based on the encrypted data does not require the assistance of a third party, which avoids the risk of privacy leakage caused by the introduction of a third party. The mechanism proposed in this paper is verified by security analysis. In the threat model, neither the model parameters of the model trainer nor the original data of the data provider will have the problem of privacy leakage. At the same time, this paper verifies the usability and performance of the proposed mechanism through experiments on real-world datasets. The experimental results show that compared with the model trained on the plaintext data set under normal conditions, the model trained by the proposed mechanism has no loss of accuracy and the training time is acceptable. In order to further evaluate the advantages of the scheme in this paper, a comparative experiment with other similar privacy training schemes is carried out. The experimental results show that the computation time of this mechanism on the experimental dataset is less than 5% of the comparison mechanism. At the same time, relying on the characteristics of decentralized training, the scheme in this paper has prospects in practical application scenarios.
KW - Consortium blockchain
KW - Credit data
KW - Homomorphic encryption
KW - Privacy preserving
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85104473021&partnerID=8YFLogxK
U2 - 10.11897/SP.J.1016.2021.00696
DO - 10.11897/SP.J.1016.2021.00696
M3 - 文章
AN - SCOPUS:85104473021
SN - 0254-4164
VL - 44
SP - 696
EP - 708
JO - Jisuanji Xuebao/Chinese Journal of Computers
JF - Jisuanji Xuebao/Chinese Journal of Computers
IS - 4
ER -