TY - GEN
T1 - An Innovative Subsampling Approach for Efficient SVM Training with Large Datasets
AU - Sun, Shuo
AU - Dai, Wenlin
AU - Wang, Dianpeng
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Support vector machines (SVMs) are widely recognized for their effectiveness in handling classification problems, owing to their solid theoretical foundation and excellent generalization performance. However, despite these advantages, SVMs have a significant drawback in the form of high computational time, which increases with the size of the training dataset. To address this limitation, this article presents a novel adaptive sequential subsampling method designed to accelerate the training process of SVMs. The proposed method consists of two stages. In the first stage, a space-filling design is employed to group samples into cells. Then, an initial pilot SVM model is trained by utilizing the centroids and corresponding labels of these cells. In the second stage, an adaptive sequential stratified sampling method, based on the distance between each cell and the hyperplane, is employed to select informative samples, thereby enhancing the SVM model. Numerical studies show that our approach achieves classification accuracy that is comparable to or even better than that of basic SVM, while requiring only approximately 1% of the CPU time. Consequently, our algorithm is a more efficient choice for large-scale data applications.
AB - Support vector machines (SVMs) are widely recognized for their effectiveness in handling classification problems, owing to their solid theoretical foundation and excellent generalization performance. However, despite these advantages, SVMs have a significant drawback in the form of high computational time, which increases with the size of the training dataset. To address this limitation, this article presents a novel adaptive sequential subsampling method designed to accelerate the training process of SVMs. The proposed method consists of two stages. In the first stage, a space-filling design is employed to group samples into cells. Then, an initial pilot SVM model is trained by utilizing the centroids and corresponding labels of these cells. In the second stage, an adaptive sequential stratified sampling method, based on the distance between each cell and the hyperplane, is employed to select informative samples, thereby enhancing the SVM model. Numerical studies show that our approach achieves classification accuracy that is comparable to or even better than that of basic SVM, while requiring only approximately 1% of the CPU time. Consequently, our algorithm is a more efficient choice for large-scale data applications.
KW - Adaptive subsampling
KW - Distance-based
KW - Space-filling
KW - Support vector machines
UR - https://www.scopus.com/pages/publications/105037365489
U2 - 10.1109/ICICML67980.2025.11333554
DO - 10.1109/ICICML67980.2025.11333554
M3 - Conference contribution
AN - SCOPUS:105037365489
T3 - 2025 4th International Conference on Image Processing, Computer Vision and Machine Learning, ICICML 2025
SP - 1798
EP - 1807
BT - 2025 4th International Conference on Image Processing, Computer Vision and Machine Learning, ICICML 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 4th International Conference on Image Processing, Computer Vision and Machine Learning, ICICML 2025
Y2 - 21 November 2025 through 23 November 2025
ER -