Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary

Shuo Yang, Zhe Cao, Sheng Guo, Ruiheng Zhang, Ping Luo, Shengping Zhang*, Liqiang Nie

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

摘要

Existing paradigms of pushing the state of the art require exponentially more training data in many fields. Coreset selection seeks to mitigate this growing demand by identifying the most efficient subset of training data. In this paper, we delve into geometry-based coreset methods and preliminarily link the geometry of data distribution with models' generalization capability in theoretics. Leveraging these theoretical insights, we propose a novel coreset construction method by selecting training samples to reconstruct the decision boundary of a deep neural network learned on the full dataset. Extensive experiments across various popular benchmarks demonstrate the superiority of our method over multiple competitors. For the first time, our method achieves a 50% data pruning rate on the ImageNet-1K dataset while sacrificing less than 1% in accuracy. Additionally, we showcase and analyze the remarkable cross-architecture transferability of the coresets derived from our approach.

源语言英语
页(从-至)55948-55960
页数13
期刊Proceedings of Machine Learning Research
235
出版状态已出版 - 2024
活动41st International Conference on Machine Learning, ICML 2024 - Vienna, 奥地利
期限: 21 7月 202427 7月 2024

指纹

探究 'Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary' 的科研主题。它们共同构成独一无二的指纹。

引用此