具有复制鲁棒性的高效数据交易估值框架

Translated title of the contribution: Efficient Data Trading Valuation Framework with Replication Robustness

Siyuan Chen, Chen Chen, Ye Yuan*, Boyang Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

With the emergence of data trading markets, data valuation has become a key technological challenge. Although data Shapley value has been proven to be a fair method for measuring data value, its high computational cost and vulnerability to data replication attacks severely limit its application in real-world data trading scenarios. To address these issues, this paper proposes an efficient and replication-robust framework for data valuation. To improve the computational efficiency of data Shapley value, this paper optimizes the update strategy after utility calculation of data sets, and introduces an efficient approximation algorithm, OA-Shapley (one for all Shapley). This algorithm updates the Shapley values of all data points through a single utility calculation, significantly enhancing computational efficiency while theoretically guaranteeing the unbiasedness and mean squared error of the algorithm. To tackle the problem of data replication attacks, this paper theoretically derives that strict redundancy is a sufficient condition for replication robustness, and proposes the CL+Shapley (Cluster+Shapley) framework. This framework achieves strict redundancy through clustering preprocessing, effectively defending against data replication attacks and decoupling from specific data Shapley algorithms, thus ensuring wide applicability. Experimental results show that the OA-Shapley algorithm outperforms baseline algorithms by 12.4% (3.5%) in AUC when removing high (low) value data points, and increases the detection of invalid data by 9%~32%. The CL+Shapley framework also demonstrates excellent robustness against replication attacks.

Translated title of the contributionEfficient Data Trading Valuation Framework with Replication Robustness
Original languageChinese (Traditional)
Pages (from-to)2532-2547
Number of pages16
JournalJournal of Frontiers of Computer Science and Technology
Volume19
Issue number9
DOIs
Publication statusPublished - 1 Sept 2025

Fingerprint

Dive into the research topics of 'Efficient Data Trading Valuation Framework with Replication Robustness'. Together they form a unique fingerprint.

Cite this