面向征信数据安全共享的SVM训练机制

Meng Shen; Jie Zhang; Lie Huang Zhu; Ke Xu; Kai Xiang Zhang; Hui Zhong Li; Xiang Yun Tang

doi:10.11897/SP.J.1016.2021.00696

面向征信数据安全共享的SVM训练机制

Meng Shen, Jie Zhang, Lie Huang Zhu^*, Ke Xu, Kai Xiang Zhang, Hui Zhong Li, Xiang Yun Tang

^*此作品的通讯作者

网络空间安全学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

In the credit reporting industry, the richness and diversity of credit reporting data is extremely important for the development of credit evaluation. However, credit data owned by credit reporting agencies, especially small credit reporting agencies, has issues like incomplete content, incomplete types, and insufficient instance numbers. Therefore, data sharing among credit reporting agencies is very important. In practical application scenarios, credit data has the characteristics of high value, strong privacy, and easy to be copied without authorization. These characteristics will cause great security challenges when sharing credit data. To solve this problem, this paper proposes a SVM training mechanism for secure sharing of credit data. Meanwhile we design a system prototype based on this mechanism, as showed in Figure 3 in the manuscript. This mechanism is based on the consortium blockchain and the addition homomorphic encryption scheme Paillier. With the decentralization of blockchain technology, this mechanism does not need to rely on any trusted third party during model training. At the same time, through secure collaborative computing between credit reporting agencies, the mechanism can meet the credit evaluation needs of the model trainer without revealing data privacy. Firstly, the shared data is stored on the blockchain and is encrypted to ensure that the data is secure and cannot be tampered. This process is completed through smart contracts, without the need for a third party as a data sharing platform. Secondly, based on the addition homomorphic encryption algorithm Paillier, this paper implements various secure operations in the SVM training process based on the stochastic gradient descent algorithm, and designs a secure SVM training algorithm according to the training process. The algorithm flow is shown in Algorithm 2. Based on this algorithm, the credit reporting agencies participating to the calculation can perform operations on the shared encrypted data, ensuring that the model trainer can train the credit evaluation model without leaking the original data. During the training process, only the data provider and a model trainer participate in the calculation. The calculation based on the encrypted data does not require the assistance of a third party, which avoids the risk of privacy leakage caused by the introduction of a third party. The mechanism proposed in this paper is verified by security analysis. In the threat model, neither the model parameters of the model trainer nor the original data of the data provider will have the problem of privacy leakage. At the same time, this paper verifies the usability and performance of the proposed mechanism through experiments on real-world datasets. The experimental results show that compared with the model trained on the plaintext data set under normal conditions, the model trained by the proposed mechanism has no loss of accuracy and the training time is acceptable. In order to further evaluate the advantages of the scheme in this paper, a comparative experiment with other similar privacy training schemes is carried out. The experimental results show that the computation time of this mechanism on the experimental dataset is less than 5% of the comparison mechanism. At the same time, relying on the characteristics of decentralized training, the scheme in this paper has prospects in practical application scenarios.

投稿的翻译标题	SVM Training Mechanism for Secure Sharing of Credit Data
源语言	繁体中文
页（从-至）	696-708
页数	13
期刊	Jisuanji Xuebao/Chinese Journal of Computers
卷	44
期	4
DOI	https://doi.org/10.11897/SP.J.1016.2021.00696
出版状态	已出版 - 4月 2021

关键词

Consortium blockchain
Credit data
Homomorphic encryption
Privacy preserving
Support vector machine

访问文件

10.11897/SP.J.1016.2021.00696

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f9d6cba7babe4ffdb74106060521fcf0,

title = "面向征信数据安全共享的SVM训练机制",

abstract = "In the credit reporting industry, the richness and diversity of credit reporting data is extremely important for the development of credit evaluation. However, credit data owned by credit reporting agencies, especially small credit reporting agencies, has issues like incomplete content, incomplete types, and insufficient instance numbers. Therefore, data sharing among credit reporting agencies is very important. In practical application scenarios, credit data has the characteristics of high value, strong privacy, and easy to be copied without authorization. These characteristics will cause great security challenges when sharing credit data. To solve this problem, this paper proposes a SVM training mechanism for secure sharing of credit data. Meanwhile we design a system prototype based on this mechanism, as showed in Figure 3 in the manuscript. This mechanism is based on the consortium blockchain and the addition homomorphic encryption scheme Paillier. With the decentralization of blockchain technology, this mechanism does not need to rely on any trusted third party during model training. At the same time, through secure collaborative computing between credit reporting agencies, the mechanism can meet the credit evaluation needs of the model trainer without revealing data privacy. Firstly, the shared data is stored on the blockchain and is encrypted to ensure that the data is secure and cannot be tampered. This process is completed through smart contracts, without the need for a third party as a data sharing platform. Secondly, based on the addition homomorphic encryption algorithm Paillier, this paper implements various secure operations in the SVM training process based on the stochastic gradient descent algorithm, and designs a secure SVM training algorithm according to the training process. The algorithm flow is shown in Algorithm 2. Based on this algorithm, the credit reporting agencies participating to the calculation can perform operations on the shared encrypted data, ensuring that the model trainer can train the credit evaluation model without leaking the original data. During the training process, only the data provider and a model trainer participate in the calculation. The calculation based on the encrypted data does not require the assistance of a third party, which avoids the risk of privacy leakage caused by the introduction of a third party. The mechanism proposed in this paper is verified by security analysis. In the threat model, neither the model parameters of the model trainer nor the original data of the data provider will have the problem of privacy leakage. At the same time, this paper verifies the usability and performance of the proposed mechanism through experiments on real-world datasets. The experimental results show that compared with the model trained on the plaintext data set under normal conditions, the model trained by the proposed mechanism has no loss of accuracy and the training time is acceptable. In order to further evaluate the advantages of the scheme in this paper, a comparative experiment with other similar privacy training schemes is carried out. The experimental results show that the computation time of this mechanism on the experimental dataset is less than 5% of the comparison mechanism. At the same time, relying on the characteristics of decentralized training, the scheme in this paper has prospects in practical application scenarios.",

keywords = "Consortium blockchain, Credit data, Homomorphic encryption, Privacy preserving, Support vector machine",

author = "Meng Shen and Jie Zhang and Zhu, {Lie Huang} and Ke Xu and Zhang, {Kai Xiang} and Li, {Hui Zhong} and Tang, {Xiang Yun}",

year = "2021",

month = apr,

doi = "10.11897/SP.J.1016.2021.00696",

language = "繁体中文",

volume = "44",

pages = "696--708",

journal = "Jisuanji Xuebao/Chinese Journal of Computers",

issn = "0254-4164",

publisher = "Science China Press",

number = "4",

}

TY - JOUR

T1 - 面向征信数据安全共享的SVM训练机制

AU - Shen, Meng

AU - Zhang, Jie

AU - Zhu, Lie Huang

AU - Xu, Ke

AU - Zhang, Kai Xiang

AU - Li, Hui Zhong

AU - Tang, Xiang Yun

PY - 2021/4

Y1 - 2021/4

N2 - In the credit reporting industry, the richness and diversity of credit reporting data is extremely important for the development of credit evaluation. However, credit data owned by credit reporting agencies, especially small credit reporting agencies, has issues like incomplete content, incomplete types, and insufficient instance numbers. Therefore, data sharing among credit reporting agencies is very important. In practical application scenarios, credit data has the characteristics of high value, strong privacy, and easy to be copied without authorization. These characteristics will cause great security challenges when sharing credit data. To solve this problem, this paper proposes a SVM training mechanism for secure sharing of credit data. Meanwhile we design a system prototype based on this mechanism, as showed in Figure 3 in the manuscript. This mechanism is based on the consortium blockchain and the addition homomorphic encryption scheme Paillier. With the decentralization of blockchain technology, this mechanism does not need to rely on any trusted third party during model training. At the same time, through secure collaborative computing between credit reporting agencies, the mechanism can meet the credit evaluation needs of the model trainer without revealing data privacy. Firstly, the shared data is stored on the blockchain and is encrypted to ensure that the data is secure and cannot be tampered. This process is completed through smart contracts, without the need for a third party as a data sharing platform. Secondly, based on the addition homomorphic encryption algorithm Paillier, this paper implements various secure operations in the SVM training process based on the stochastic gradient descent algorithm, and designs a secure SVM training algorithm according to the training process. The algorithm flow is shown in Algorithm 2. Based on this algorithm, the credit reporting agencies participating to the calculation can perform operations on the shared encrypted data, ensuring that the model trainer can train the credit evaluation model without leaking the original data. During the training process, only the data provider and a model trainer participate in the calculation. The calculation based on the encrypted data does not require the assistance of a third party, which avoids the risk of privacy leakage caused by the introduction of a third party. The mechanism proposed in this paper is verified by security analysis. In the threat model, neither the model parameters of the model trainer nor the original data of the data provider will have the problem of privacy leakage. At the same time, this paper verifies the usability and performance of the proposed mechanism through experiments on real-world datasets. The experimental results show that compared with the model trained on the plaintext data set under normal conditions, the model trained by the proposed mechanism has no loss of accuracy and the training time is acceptable. In order to further evaluate the advantages of the scheme in this paper, a comparative experiment with other similar privacy training schemes is carried out. The experimental results show that the computation time of this mechanism on the experimental dataset is less than 5% of the comparison mechanism. At the same time, relying on the characteristics of decentralized training, the scheme in this paper has prospects in practical application scenarios.

AB - In the credit reporting industry, the richness and diversity of credit reporting data is extremely important for the development of credit evaluation. However, credit data owned by credit reporting agencies, especially small credit reporting agencies, has issues like incomplete content, incomplete types, and insufficient instance numbers. Therefore, data sharing among credit reporting agencies is very important. In practical application scenarios, credit data has the characteristics of high value, strong privacy, and easy to be copied without authorization. These characteristics will cause great security challenges when sharing credit data. To solve this problem, this paper proposes a SVM training mechanism for secure sharing of credit data. Meanwhile we design a system prototype based on this mechanism, as showed in Figure 3 in the manuscript. This mechanism is based on the consortium blockchain and the addition homomorphic encryption scheme Paillier. With the decentralization of blockchain technology, this mechanism does not need to rely on any trusted third party during model training. At the same time, through secure collaborative computing between credit reporting agencies, the mechanism can meet the credit evaluation needs of the model trainer without revealing data privacy. Firstly, the shared data is stored on the blockchain and is encrypted to ensure that the data is secure and cannot be tampered. This process is completed through smart contracts, without the need for a third party as a data sharing platform. Secondly, based on the addition homomorphic encryption algorithm Paillier, this paper implements various secure operations in the SVM training process based on the stochastic gradient descent algorithm, and designs a secure SVM training algorithm according to the training process. The algorithm flow is shown in Algorithm 2. Based on this algorithm, the credit reporting agencies participating to the calculation can perform operations on the shared encrypted data, ensuring that the model trainer can train the credit evaluation model without leaking the original data. During the training process, only the data provider and a model trainer participate in the calculation. The calculation based on the encrypted data does not require the assistance of a third party, which avoids the risk of privacy leakage caused by the introduction of a third party. The mechanism proposed in this paper is verified by security analysis. In the threat model, neither the model parameters of the model trainer nor the original data of the data provider will have the problem of privacy leakage. At the same time, this paper verifies the usability and performance of the proposed mechanism through experiments on real-world datasets. The experimental results show that compared with the model trained on the plaintext data set under normal conditions, the model trained by the proposed mechanism has no loss of accuracy and the training time is acceptable. In order to further evaluate the advantages of the scheme in this paper, a comparative experiment with other similar privacy training schemes is carried out. The experimental results show that the computation time of this mechanism on the experimental dataset is less than 5% of the comparison mechanism. At the same time, relying on the characteristics of decentralized training, the scheme in this paper has prospects in practical application scenarios.

KW - Consortium blockchain

KW - Credit data

KW - Homomorphic encryption

KW - Privacy preserving

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85104473021&partnerID=8YFLogxK

U2 - 10.11897/SP.J.1016.2021.00696

DO - 10.11897/SP.J.1016.2021.00696

M3 - 文章

AN - SCOPUS:85104473021

SN - 0254-4164

VL - 44

SP - 696

EP - 708

JO - Jisuanji Xuebao/Chinese Journal of Computers

JF - Jisuanji Xuebao/Chinese Journal of Computers

IS - 4

ER -

面向征信数据安全共享的SVM训练机制

摘要

关键词

访问文件

其它文件与链接

指纹

引用此