TY - JOUR
T1 - 具有复制鲁棒性的高效数据交易估值框架
AU - Chen, Siyuan
AU - Chen, Chen
AU - Yuan, Ye
AU - Li, Boyang
N1 - Publisher Copyright:
© 2025 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
PY - 2025/9/1
Y1 - 2025/9/1
N2 - With the emergence of data trading markets, data valuation has become a key technological challenge. Although data Shapley value has been proven to be a fair method for measuring data value, its high computational cost and vulnerability to data replication attacks severely limit its application in real-world data trading scenarios. To address these issues, this paper proposes an efficient and replication-robust framework for data valuation. To improve the computational efficiency of data Shapley value, this paper optimizes the update strategy after utility calculation of data sets, and introduces an efficient approximation algorithm, OA-Shapley (one for all Shapley). This algorithm updates the Shapley values of all data points through a single utility calculation, significantly enhancing computational efficiency while theoretically guaranteeing the unbiasedness and mean squared error of the algorithm. To tackle the problem of data replication attacks, this paper theoretically derives that strict redundancy is a sufficient condition for replication robustness, and proposes the CL+Shapley (Cluster+Shapley) framework. This framework achieves strict redundancy through clustering preprocessing, effectively defending against data replication attacks and decoupling from specific data Shapley algorithms, thus ensuring wide applicability. Experimental results show that the OA-Shapley algorithm outperforms baseline algorithms by 12.4% (3.5%) in AUC when removing high (low) value data points, and increases the detection of invalid data by 9%~32%. The CL+Shapley framework also demonstrates excellent robustness against replication attacks.
AB - With the emergence of data trading markets, data valuation has become a key technological challenge. Although data Shapley value has been proven to be a fair method for measuring data value, its high computational cost and vulnerability to data replication attacks severely limit its application in real-world data trading scenarios. To address these issues, this paper proposes an efficient and replication-robust framework for data valuation. To improve the computational efficiency of data Shapley value, this paper optimizes the update strategy after utility calculation of data sets, and introduces an efficient approximation algorithm, OA-Shapley (one for all Shapley). This algorithm updates the Shapley values of all data points through a single utility calculation, significantly enhancing computational efficiency while theoretically guaranteeing the unbiasedness and mean squared error of the algorithm. To tackle the problem of data replication attacks, this paper theoretically derives that strict redundancy is a sufficient condition for replication robustness, and proposes the CL+Shapley (Cluster+Shapley) framework. This framework achieves strict redundancy through clustering preprocessing, effectively defending against data replication attacks and decoupling from specific data Shapley algorithms, thus ensuring wide applicability. Experimental results show that the OA-Shapley algorithm outperforms baseline algorithms by 12.4% (3.5%) in AUC when removing high (low) value data points, and increases the detection of invalid data by 9%~32%. The CL+Shapley framework also demonstrates excellent robustness against replication attacks.
KW - clustering algorithm
KW - data market
KW - data Shapley value
KW - data trading
KW - replication robustness
UR - https://www.scopus.com/pages/publications/105014966449
U2 - 10.3778/j.issn.1673-9418.2412075
DO - 10.3778/j.issn.1673-9418.2412075
M3 - 文章
AN - SCOPUS:105014966449
SN - 1673-9418
VL - 19
SP - 2532
EP - 2547
JO - Journal of Frontiers of Computer Science and Technology
JF - Journal of Frontiers of Computer Science and Technology
IS - 9
ER -