摘要
With the emergence of data trading markets, data valuation has become a key technological challenge. Although data Shapley value has been proven to be a fair method for measuring data value, its high computational cost and vulnerability to data replication attacks severely limit its application in real-world data trading scenarios. To address these issues, this paper proposes an efficient and replication-robust framework for data valuation. To improve the computational efficiency of data Shapley value, this paper optimizes the update strategy after utility calculation of data sets, and introduces an efficient approximation algorithm, OA-Shapley (one for all Shapley). This algorithm updates the Shapley values of all data points through a single utility calculation, significantly enhancing computational efficiency while theoretically guaranteeing the unbiasedness and mean squared error of the algorithm. To tackle the problem of data replication attacks, this paper theoretically derives that strict redundancy is a sufficient condition for replication robustness, and proposes the CL+Shapley (Cluster+Shapley) framework. This framework achieves strict redundancy through clustering preprocessing, effectively defending against data replication attacks and decoupling from specific data Shapley algorithms, thus ensuring wide applicability. Experimental results show that the OA-Shapley algorithm outperforms baseline algorithms by 12.4% (3.5%) in AUC when removing high (low) value data points, and increases the detection of invalid data by 9%~32%. The CL+Shapley framework also demonstrates excellent robustness against replication attacks.
| 投稿的翻译标题 | Efficient Data Trading Valuation Framework with Replication Robustness |
|---|---|
| 源语言 | 繁体中文 |
| 页(从-至) | 2532-2547 |
| 页数 | 16 |
| 期刊 | Journal of Frontiers of Computer Science and Technology |
| 卷 | 19 |
| 期 | 9 |
| DOI | |
| 出版状态 | 已出版 - 1 9月 2025 |
关键词
- clustering algorithm
- data Shapley value
- data market
- data trading
- replication robustness
指纹
探究 '具有复制鲁棒性的高效数据交易估值框架' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver